巴西专利BR112013033835B1 METHOD, APPARATUS AND NON- TRANSITIONAL ENVIRONMENT FOR IMPROVED AUDIO AUTHORSHIP AND RENDING IN 3D

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
system and tools for improved authoring and rendering of 3d audio. The present invention relates to improved tools for rendering and authoring audio playback data. some of such authoring tools allow audio playback data to be generalized to a wide variety of playback environments. audio playback data can be authored by creating metadata for audio objects. metadata can be created with reference to speaker zones. During the rendering process, the audio playback data can be played according to the playback speaker feedback of a particular playback environment.
公开号:BR112013033835B1
申请号:R112013033835-0
申请日:2012-06-27
公开日:2021-09-08
发明作者:Nicolas R. Tsingos；Charles Q. Robinson；Jurgen W. Scharpf
申请人:Dolby Laboratories Licensing Corporation；
IPC主号:

专利说明:

CROSS REFERENCE TO RELATED ORDERS
[0001] This application claims priority from Provisional Application No. US 61/504,005 filed July 1, 2011 and Provisional Application No. US 61/636,102 filed April 20, 2012, both of which are incorporated herein by way of reference in its entirety for all purposes. FIELD OF TECHNIQUE
[0002] This description refers to the authoring and rendering of audio playback data. In particular, this description relates to authoring and rendering audio reproduction data for reproduction environments such as cinema sound reproduction systems. BACKGROUND
[0003] Since the introduction of sound with film in 1927, there has been a steady evolution of the technology used to capture the artistic intent of the motion picture soundtrack and to reproduce it in a cinematic environment. In the 1930s, synchronized sound on disk paved the way for variable area sound in film, which was further enhanced in the 1940s with cinematic acoustic considerations and improved speaker designs, along with the early introduction of multitrack recording. and steerable playback (using control tones to move sounds). In the 1950s and 1960s, magnetic segmentation allowed for multiple-channel movie playback in cinemas, introducing in surround channels and up to five screen channels in premium theaters.
[0004] In the 1970s, Dolby introduced noise reduction, both in post-production and on film, along with a cost-effective means of encoding and distributing mixes with 3 screen channels and a mono surround channel. Cinema sound quality was further enhanced in the 1980s with Dolby Spectral Recording (SR) noise reduction and certification programs such as THX. Dolby brought digital sound to cinema during the 1990s with a 5.1 channel format that provides distinct left, center and right, distinct screen channels, left and right surround arrangements, and a subwoofer channel for low frequency effects. Dolby Surround 7.1, introduced in 2010, increased the number of surround channels by dividing the existing left and right surround channels into four "zones".
[0005] As the number of channels increases and speaker arrangements transition from a flat two-dimensional (2D) arrangement to a three-dimensional (3D) arrangement including elevation, the task of positioning and rendering sounds becomes more and more difficult. Improved audio authoring and rendering methods would be desirable. SUMMARY
[0006] Some aspects of the matter described in this description can be implemented in tools for authoring and rendering audio reproduction data. Some such authoring tools allow audio playback data to be generated for a wide variety of playback environments. According to some such implementations, audio playback data can be initiated by creating metadata for audio objects. Metadata can be created in reference to speaker zones. During the rendering process, audio playback data can be output according to the playback speaker layout of a particular playback environment.
[0007] Some implementations described in this document provide an apparatus that includes an interface system and a logical system. The logical system can be configured to receive, through the interface system, audio playback data that includes one or more audio objects and associated metadata and playback environment data. Playback environment data may include an indication of a range of playback speakers in the playback environment and an indication of the location of each playback speaker within the playback environment. The logical system can be configured to render the audio objects into one or more loudspeaker feed signals based, at least in part, on the associated metadata and playback environment data, where each loudspeaker feed signal -speaker corresponds to at least one of the playback speakers within the playback environment. The system logic can be configured to compute speaker gains that correspond to virtual speaker positions.
[0008] The playback environment can be, for example, a theater sound system environment. The playback environment can have a Dolby Surround 5.1 setting, a Dolby Surround 7.1 Setting or a Hamasaki 22.2 sound setting. Playback environment data can include playback speaker layout data that indicates playback speaker locations. Playback environment data may include playback speaker zone layout data that indicate playback speaker areas and playback speaker locations that correspond to playback speaker areas.
[0009] Metadata can include information to map an audio object position to a single playback speaker location. Rendering can involve creating an aggregate gain based on one or more of a desired audio object position, a distance from the desired audio object position to a reference position, a speed of an audio object, or a content type. of audio object. Metadata can include data to constrain an audio object's position to a one-dimensional curve or two-dimensional surface. Metadata can include trajectory data for an audio object.
[00010] Rendering may involve imposing speaker zone restrictions. For example, the device may include a user input system. According to some implementations, rendering may involve applying screen-to-room balance control based on screen-to-room balance control data received from the user input system.
[00011] The apparatus may include a display system. The logic system can be configured to control the display system to display a dynamic three-dimensional view of the playing environment.
[00012] Rendering can involve controlling the audio object spread out in one or more of three dimensions. Rendering can involve dynamic object blobbing in response to loudspeaker overload. Rendering can involve mapping object locations to the speaker array planes of the playback environment.
[00013] The apparatus may include one or more non-transient storage media, such as memory devices of a memory system. Memory devices can, for example, include random access memory (RAM), read-only memory (ROM), flash memory, one or more hard disks, etc. The interface system can include an interface between the logical system and one or more such memory devices. The interface system can also include a network interface.
[00014] Metadata can include speaker zone restriction metadata. The logic system can be configured to attenuate selected speaker power signals by performing the following operations: compute first gains that include contributions from selected speakers; compute seconds gained that do not include contributions from selected speakers; and merge the first wins with the second wins. The logic system can be configured to determine whether to apply panning rules to an audio object position or to map an audio object position to a single speaker location. The logic system can be configured to smooth speaker gain transitions when transitioning from mapping an audio object position from a unique first speaker location to a unique second speaker location. The logic system can be configured to smooth speaker gain transitions by transitioning between mapping an audio object position to a single speaker location and applying panning rules to the audio object position. The logic system can be configured to compute speaker gains for audio object positions along a one-dimensional curve between virtual speaker positions.
[00015] Some methods described in this document involve receiving audio playback data that includes one or more audio objects and associated metadata and receiving playback environment data that includes an indication of a series of playback speakers in the breeding environment. Playback environment data may include an indication of the location of each playback speaker within the playback environment. The methods can involve rendering the audio objects into one or more speaker feed signals based, at least in part, on the associated metadata. Each speaker power signal can correspond to at least one of the playback speakers within the playback environment. The playback environment can be theater sound system environment.
[00016] Rendering can involve creating an aggregate gain based on one or more of a desired audio object position, a distance from the desired audio object position to a reference position, a speed of an audio object, or a audio object content type. Metadata can include data to constrain an audio object's position to a one-dimensional curve or two-dimensional surface. Rendering can involve imposing speaker zone restrictions.
[00017] Some deployments may be manifested in one or more non-transient media that have software stored in them. The software may include instructions to control one or more devices to perform the following operations: receive audio playback data comprising one or more audio objects and associated metadata; receiving playback environment data comprising an indication of a series of playback speakers in the playback environment and an indication of the location of each playback speaker within the playback environment; and rendering the audio objects into one or more speaker feed signals based, at least in part, on the associated metadata. Each speaker power signal can correspond to at least one of the playback speakers within the playback environment. The playback environment can, for example, be a theater sound system environment.
[00018] Rendering can involve creating an aggregate gain based on one or more of a desired audio object position, a distance from the desired audio object position to a reference position, a speed of an audio object, or a audio object content type. Metadata can include data to constrain an audio object's position to a one-dimensional curve or two-dimensional surface. Rendering can involve imposing speaker zone restrictions. Rendering can involve dynamic object blobbing in response to loudspeaker overload.
[00019] Alternative devices and apparatus are described in this document. Some such apparatus may include an interface system, a user input system and a logic system. The logical system can be configured to receive audio data through the interface system, which receives a position of an audio object through the user input system or the interface system and determines a position of the audio object in a three-dimensional space. The determination may involve constraining the position to a one-dimensional curve or a two-dimensional surface within three-dimensional space. The logical system can be configured to create metadata associated with the audio object based, at least in part, on user input received through the user input system, the metadata including data indicating the position of the audio object in three-dimensional space.
[00020] Metadata can include trajectory data that indicates a time-varying position of the audio object within three-dimensional space. The logic system can be configured to compute trajectory data according to user input received via the user input system. Trajectory data can include a set of positions within three-dimensional space in multiple instances of time. Trajectory data can include a start position, velocity data and acceleration data. Trajectory data can include a start position and an equation that defines positions in three-dimensional space and corresponding times.
[00021] The apparatus may include a display system. The logic system can be configured to control the display system to display an audio object trajectory according to the trajectory data.
[00022] The system logic can be configured to create speaker zone restriction metadata according to user input received through the user input system. Speaker zone restriction metadata can include data to disable selected speakers. The logical system can be configured to create speaker zone restriction metadata by mapping an audio object position to a single speaker.
[00023] The apparatus may include a sound reproduction system. The logic system can be configured to control the sound reproduction system, at least in part, according to the metadata.
[00024] The position of the audio object can be constrained to a one-dimensional curve. The logic system can be further configured to create virtual speaker positions along the one-dimensional curve.
[00025] Alternative methods are described in this document. Some such methods involve receiving audio data, receiving a position from an audio object, and determining an audio object's position in three-dimensional space. The determination may involve constraining the position to a one-dimensional curve or a two-dimensional surface within three-dimensional space. The methods can involve creating metadata associated with the audio object based, at least in part, on user input.
[00026] Metadata can include data that indicates the position of the audio object in three-dimensional space. Metadata can include trajectory data that indicates a time-varying position of the audio object within three-dimensional space. Creating the metadata may involve creating speaker zone restriction metadata, for example, based on user input. Speaker zone restriction metadata can include data to disable selected speakers.
[00027] The position of the audio object can be constrained to a one-dimensional curve. Methods may involve creating virtual speaker positions along the one-dimensional curve.
[00028] Other aspects of this description can be implemented in one or more non-transient media that have software stored in them. The software may include instructions to control one or more devices to perform the following operations: receive audio data; receive a position from an audio object; and determine a position of the audio object in a three-dimensional space. The determination may involve constraining the position to a one-dimensional curve or a two-dimensional surface within three-dimensional space. The software may include instructions to control one or more devices to create metadata associated with the audio object. Metadata can be created based, at least in part, on user input.
[00029] Metadata can include data that indicates the position of the audio object in three-dimensional space. Metadata can include trajectory data that indicates a time-varying position of the audio object within three-dimensional space. Creating the metadata may involve creating speaker zone restriction metadata, for example, based on user input. Speaker zone restriction metadata can include data to disable selected speakers.
[00030] The position of the audio object can be constrained to a one-dimensional curve. Software may include instructions to control one or more devices to create virtual speaker positions along the one-dimensional curve.
[00031] Details of one or more implementations of the matter described in this descriptive report are presented in the attached drawings and in the description below. Other features, aspects and advantages will become evident from the description, drawings and embodiments. Note that the relative dimensions of the Figures below may not be drawn to scale. BRIEF DESCRIPTION OF THE DRAWINGS
[00032] Figure 1 shows an example of a playback environment that has a Dolby Surround 5.1 setting.
[00033] Figure 2 shows an example of a playback environment that has a Dolby Surround 7.1 setting.
[00034] Figure 3 shows an example of a playback environment that has a Hamasaki 22.2 surround sound setting.
[00035] Figure 4A shows an example of a graphical user interface (GUI) that depicts speaker zones at varying elevations in a virtual playback environment.
[00036] Figure 4B shows an example of another playback environment.
[00037] Figures 5A to 5C show examples of speaker responses that correspond to an audio object that has a position that is restricted to a two-dimensional surface of a three-dimensional space.
[00038] Figures 5D and 5E show examples of two-dimensional surfaces to which an audio object can be restricted.
[00039] Figure 6A is a flowchart that highlights an example of a process of restricting the positions of an audio object to a two-dimensional surface.
[00040] Figure 6B is a flowchart that highlights an example of a process for mapping an audio object position to a single speaker location or a single speaker zone.
[00041] Figure 7 is a flowchart that highlights a process of establishing and using virtual speakers.
[00042] Figures 8A to 8C show examples of virtual speakers mapped to line endpoints and corresponding speaker responses.
[00043] Figures 9A to 9C show examples of using a virtual cable to move an audio object.
[00044] Figure 10A is a flowchart that highlights a process of using a virtual cable to move an audio object.
[00045] Figure 10B is a flowchart that highlights an alternative process of using a virtual cable to move an audio object.
[00046] Figures 10C to 10E show examples of the process highlighted in Figure 10B.
[00047] Figure 11 shows an example of applying speaker zone restriction in a virtual playback environment.
[00048] Figure 12 is a flowchart that highlights some examples of application of loudspeaker zone restriction rules.
[00049] Figures 13A and 13B show an example of a GUI that can switch between a two-dimensional view and a three-dimensional view of a virtual playback environment.
[00050] Figures 13C-13E show combinations of two-dimensional and three-dimensional descriptions of breeding environments.
[00051] Figure 14A is a flowchart that highlights a process of controlling an apparatus to present GUIs such as those shown in Figures 13C to 13E.
[00052] Figure 14B is a flowchart that highlights a process of rendering audio objects for a playback environment.
[00053] Figure 15A shows an example of an audio object and associated audio object width in a virtual playback environment.
[00054] Figure 15B shows an example of a spread profile corresponding to the audio object width shown in Figure 15A.
[00055] Figure 16 is a flowchart that highlights a blobbing process of audio objects.
[00056] Figures 17A and 17B show examples of an audio object positioned in a three-dimensional virtual reproduction environment.
[00057] Figure 18 shows examples of zones that correspond to panning modes.
[00058] Figures 19A to 19D show examples of application of near-field and far-field panoramic positioning techniques to audio objects in different locations.
[00059] Figure 20 indicates speaker zones of a playback environment that can be used in a screen-to-room adjustment control process.
[00060] Figure 21 is a block diagram that provides examples of components of an authoring and/or rendering apparatus.
[00061] Figure 22A is a block diagram that represents some components that can be used for audio content creation.
[00062] Figure 22B is a block diagram representing some components that can be used for audio playback in a playback environment.
[00063] Reference numerals and similar designations in the various drawings indicate the same elements. DESCRIPTION OF EXEMPLARY MODALITIES
[00064] The following description is directed to certain implementations for the purposes of describing some innovative aspects of this description, as well as examples of contexts in which these innovative aspects can be implemented. However, the teachings in this document can be applied in a number of different ways. For example, although various implementations have been described in terms of particular breeding environments, the teachings in this document are broadly applicable to other known breeding environments, as well as breeding environments that may be introduced in the future. Similarly, while examples of graphical user interfaces (GUIs) are presented in this document, some of which provide examples of speaker locations, speaker zones, etc., other implementations are contemplated by the inventors. Furthermore, the deployments described can be deployed to various authoring and/or rendering tools, which can be deployed to a variety of hardware, software, firmware, etc. Accordingly, the teachings of this description are not intended to be limited to the implementations shown in the Figures and/or described herein, but rather have wide applicability.
[00065] Figure 1 shows an example of a playback environment that has a Dolby Surround 5.1 setting. Dolby Surround 5.1 was developed in the 1990s, however, its configuration is still widely used in theater sound system environments. A projector 105 can be configured to project video images, for example for a movie, onto screen 150. The audio reproduction data can be synchronized with the video images and processed by the sound processor 110. The power amplifiers 115 can provide speaker power signals to the 100 playback environment speakers.
[00066] The Dolby Surround 5.1 configuration includes left surround arrangement 120, right surround arrangement 125, each of which is triggered in series by a single channel. The Dolby Surround 5.1 configuration also includes separate channels for left screen channel 130, center screen channel 135, and right screen channel 140. A separate channel for subwoofer 145 is provided for low frequency effects (LFE).
[00067] In 2010, Dolby provided enhancements to digital cinema sound by introducing Dolby Surround 7.1. Figure 2 shows an example of a playback environment that has a Dolby Surround 7.1 setting. A digital projector 205 can be configured to receive digital video data and to project video images onto screen 150. Audio reproduction data can be processed by sound processor 210. Power amplifiers 215 can provide high power signals speaker to the 200 playback environment speakers.
[00068] The Dolby Surround 7.1 configuration includes left-side surround arrangement 220 and right-side surround arrangement 225, each of which can be driven by a single channel. Similar to Dolby Surround 5.1, the Dolby Surround 7.1 configuration includes separate channels for left screen channel 230, center screen channel 235, right screen channel 240, and subwoofer 245. However, Dolby Surround 7.1 increases the number of surround channels by dividing the left and right surround channels of Dolby Surround 5.1 into four zones: in addition to the left side surround arrangement 220 and the right side surround arrangement 225, separate channels are included for the left surround back speakers 224 and the right surround back speakers 226. Increasing the number of surround zones within the 200 playback environment can significantly improve sound localization.
[00069] In an effort to create a more immersive environment, some playback environments may be configured with high numbers of speakers, driven by high numbers of channels. Furthermore, some playback environments may include speakers employed at various elevations, some of which may be above a seating area of the playback environment.
[00070] Figure 3 shows an example of a playback environment that has a Hamasaki 22.2 surround sound setting. Hamasaki 22.2 was developed at the NHK Science & Technology Research Laboratories (NHK) in Japan as the surround sound component of Ultra High Definition Television. Hamasaki 22.2 provides 24 speaker channels, which can be used to drive speakers arranged in three layers. The upper speaker layer 310 of the playback environment 300 can be driven by 9 channels. The middle 320 speaker layer can be driven by 10 channels. The 330 bottom speaker layer can be driven by 5 channels, two of which are for the 345a and 345b subwoofers.
[00071] Consequently, the modern trend is to include not only more speakers and more channels, but also to include speakers at different heights. As the number of channels increases and the speaker layout transitions from a 2D arrangement to a 3D arrangement, the tasks of positioning and rendering sounds become increasingly difficult.
[00072] This description provides various tools, as well as related user interfaces, that increase functionality and/or reduce authoring complexity for a 3D audio sound system.
[00073] Figure 4A shows an example of a graphical user interface (GUI) that depicts speaker zones at varying elevations in a virtual playback environment. The GUI 400 can, for example, be displayed on a display device in accordance with the instructions of a logical system, in accordance with signals received from user input devices, etc. Some such devices are described below with reference to Figure 21.
[00074] As used herein in reference to virtual playback environments such as the 404 virtual playback environment, the term "speaker zone" generally refers to a logical construct that may or may not have a one-to-one matching with a playback speaker from a real playback environment. For example, a "speaker zone location" may or may not correspond to a particular playback speaker location in a theater playback environment. Instead, the term "speaker zone location" can generally refer to a zone in a virtual playback environment. In some deployments, a speaker zone of a virtual playback environment can match a virtual speaker, for example, through the use of virtualization technology such as Dolby Headphone,TM (sometimes referred to as Mobile Surround (TM)), which creates a real-time virtual surround sound environment using a set of two-channel stereo headphones. On the GUI 400, there are seven 402a speaker zones on a first elevation and two 402b speaker zones on a second elevation, creating a total of nine speaker zones in the 404 virtual playback environment. speaker zones 1-3 are in the front area 405 of the virtual playback environment 404. The front area 405 can correspond, for example, to an area of a movie playback environment in which a screen 150 is located, to a area of a house in which a television screen is located, etc.
[00075] Here, speaker zone 4 generally corresponds to speakers in the left area 410 and speaker zone 5 corresponds to speakers in the right area 415 of the virtual playback environment 404. The speaker zone speaker 6 corresponds to a rear left area 412 and speaker zone 7 corresponds to a rear right area 414 of the virtual playback environment 404. Speaker zone 8 corresponds to speakers in an upper area 420a and speaker zone 9 corresponds to speakers in an upper area 420b, which can be a virtual ceiling area such as a virtual ceiling area 520 shown in Figures 5D and 5E. Consequently, and as described in more detail below, the speaker zone locations 1-9 that are shown in Figure 4A may or may not match the playback speaker locations of an actual playback environment. Additionally, other deployments may include more or less speaker zones and/or elevations.
[00076] In various implementations described in this document, a user interface such as GUI 400 can be used as part of an authoring tool and/or a rendering tool. In some deployments, the authoring tool and/or rendering tool may be deployed through software stored in one or more non-transient media. The authoring tool and/or rendering tool can be deployed (at least in part) by hardware, firmware, etc., such as the logical system and other devices described below with reference to Figure 21. In some authoring deployments, a associated authoring tool can be used to create metadata for associated audio data. Metadata can, for example, include data indicating the position and/or trajectory of an audio object in a three-dimensional space, speaker zone restriction data, etc. Metadata can be created with respect to speaker zones 402 of virtual playback environment 404, rather than with respect to a particular speaker arrangement of an actual playback environment. A rendering tool can receive associated audio data and metadata and can compute audio gains and speaker feed signals for a playback environment. Such audio gains and speaker power signals can be computed according to a pan-amplitude positioning process, which can create a perception that a sound is coming from a position P in the playback environment. For example, speaker power signals can be supplied to playback speakers 1 through N of the playback environment according to the following equation: xi(t) = gix(t), i = 1, . . . N (Equation 1)
[00077] In Equation 1, Xj(t) represents the speaker power signal to be applied to speaker i, gi represents the corresponding channel gain factor, x(t) represents the audio signal and t represents the time. Gain factors can be determined, for example, in accordance with the panoramic amplitude positioning methods described in Section 2, pages 3 and 4 of V. Pulkki, Compensating Displacement of Amplitude-Panned Virtual Sources (AES Society of Acoustic Engineering (AES) ) International Conference on Virtual, Synthetic and Entertainment Acoustics), which is hereby incorporated by reference. In some deployments, gains can be frequency dependent. In some deployments, a time delay can be introduced by replacing x(t) with x(t-Δt).
[00078] In some rendering deployments, audio playback data created in reference to speaker zones 402 may be mapped to speaker locations for a wide range of playback environments, which may be in a Dolby configuration Surround 5.1, a Dolby Surround 7.1 setting, a Hamasaki 22.2 setting, or other setting. For example, referring to Figure 2, a rendering tool can map the audio playback data for speaker zones 4 and 5 to the left side surround arrangement 220 and to the right side surround arrangement 225 of a playback environment which has a Dolby Surround 7.1 setting. The audio playback data for speaker zones 1, 2, and 3 can be mapped to left screen channel 230, right screen channel 240, and center screen channel 235, respectively. Audio playback data for speaker zones 6 and 7 can be mapped to surround back left speakers 224 and surround back right speakers 226.
[00079] Figure 4B shows an example of another playback environment. In some deployments, a rendering tool might map the audio playback data for speaker zones 1, 2, and 3 to the corresponding screen speakers 455 of the playback environment 450. A rendering tool might map the data audio playback data for speaker zones 4 and 5 to left side surround arrangement 460 and right side surround arrangement 465 and can map the audio playback data for speaker zones 8 and 9 to high - upper left speakers 470a and upper right speakers 470b. Audio playback data for speaker zones 6 and 7 can be mapped to left surround back speakers 480a and surround back right speakers 480b.
[00080] In some authoring deployments, an authoring tool can be used to create metadata for audio objects. As used in this document, the term "audio object" can refer to a stream of data and associated audio metadata. Metadata typically indicates the object's 3D position, rendering constraints as well as content type (eg dialog, effects, etc.). Depending on the deployment, metadata may include other types of data such as width data, gain data, trajectory data, etc. some audio objects may be static while others move. Audio object details can be authored or rendered according to associated metadata which, among other things, can indicate the position of the audio object in three-dimensional space at a given point of time. When audio objects are monitored or played in a playback environment, the audio objects can be rendered according to positional metadata using the playback speakers that are present in the playback environment, rather than being released to a predetermined physical channel, as is the case with traditional channel-based systems such as Dolby 5.1 and Dolby 7.1.
[00081] Various authoring and rendering tools are described in this document with reference to a GUI that is substantially the same GUI 400. However, several other user interfaces, including but not limited to GUIs, can be used in association to these authoring and rendering tools. Some such tools can simplify the authoring process by applying various types of restrictions. Some deployments will now be described with reference to Figures 5A et seq.
[00082] Figures 5A to 5C show examples of speaker responses that correspond to an audio object that has a position that is restricted to a two-dimensional surface of a three-dimensional space, which is a hemisphere in this example. In these examples, speaker responses were computed by a renderer that assumes a 9-speaker configuration, with each speaker corresponding to one of speaker zones 1 to 9. However, as noted elsewhere in this document, there may generally not be a one-to-one mapping between speaker zones in a virtual playback environment and playback speakers in a playback environment. First, referring to Figure 5A, the audio object 505 is shown at a location on the left front portion of the virtual playback environment 404. Consequently, the speaker corresponding to the speaker zone 1 indicates a substantial gain and the speakers corresponding to speaker zones 3 and 4 indicate moderate gains.
[00083] In this example, the location of the 505 audio object can be changed by placing a cursor 510 on the 505 audio object and "dragging" the 505 audio object to a desired location in the x,y plane of the virtual playback environment 404. As the object is dragged towards the middle of the reproduction environment, it is also mapped to the surface of a hemisphere and its elevation increases. Here, increases in elevation of the 505 audio object are indicated by an increase in the diameter of the circle representing the 505 audio object: as shown in Figures 5B and 5C, as the 505 audio object is dragged to the top center from the 404 virtual playback environment, the 505 audio object appears increasingly larger. Alternatively or additionally, the elevation of the 505 audio object can be indicated by changes in color, brightness, a numerical elevation indication, etc. When audio object 505 is positioned at the top center of virtual playback environment 404, as shown in Figure 5C, the speakers corresponding to speaker zones 8 and 9 indicate substantial gains and the other speakers indicate little or none.
[00084] In this deployment, the position of the 505 audio object is restricted to a two-dimensional surface, such as a spherical surface, an elliptical surface, a conical surface, a cylindrical surface, a wedge, etc. Figures 5D and 5E show examples of two-dimensional surfaces to which an audio object can be constrained. Figures 5D and 5E are cross-sectional views through virtual play environment 404, with front area 405 shown on the left. In Figures 5D and 5E, the y-z geometry axis y-values increase towards the front area 405 of the virtual playback environment 404, to retain consistency with the x-y geometry axis orientations shown in Figures 5A through 5C.
[00085] In the example shown in Figure 5D, the two-dimensional surface 515a is a section of an ellipsoid. In the example shown in Figure 5E, the two-dimensional surface 515b is a section of a wedge. However, the shapes, orientations and positions of the two-dimensional surfaces 515 shown in Figures 5D and 5E are merely examples. In alternative deployments, at least a portion of the two-dimensional surface 515 may extend outside the virtual play environment 404. In some such deployments, the two-dimensional surface 515 may extend above the virtual ceiling 520. Consequently, the three-dimensional space within which the two-dimensional surface 515 stretches is not necessarily coextensive with the volume of the virtual playback environment 404. In still other implementations, an audio object may be restricted to one-dimensional features such as curves, straight lines, etc.
[00086] Figure 6A is a flowchart that highlights an example of a process of restricting the positions of an audio object to a two-dimensional surface. As with other flowcharts that are provided in this document, process 600 operations are not necessarily performed in the order shown. Furthermore, process 600 (and other processes provided herein) may include more or fewer operations than are indicated in the drawings and/or described. In this example, blocks 605 to 622 are performed by an authoring tool and blocks 624 to 630 are performed by a rendering tool. The authoring tool and the rendering tool can be deployed on a single device or on more than one device. Although Figure 6A (and other flowcharts provided in this document) may create the impression that the authoring and rendering processes are performed sequentially, in many deployments, the authoring and rendering processes are performed at substantially the same time. Authoring processes and rendering processes can be interactive. For example, the results of an authoring operation can be sent to the rendering tool, the corresponding results of the rendering tool can be evaluated by a user, who can perform further authoring based on these results, etc.
[00087] At block 605, an indication is received that an audio object position must be constrained to a two-dimensional surface. The indication can, for example, be received by a device logic system that is configured to provide authoring and/or rendering tools. As with other deployments described in this document, the logical system can operate according to software instructions stored in a non-transient medium, according to firmware, etc. The indication can be a signal from a user input device (such as a touch screen, mouse, trackball, gesture recognition device, etc.) in response to input from a user.
[00088] In option block 607, audio data is received. Block 607 is optional in this example, as the audio data can also go directly to a renderer from another source (for example, a mixing console) that is time-synchronized to the metadata authoring tool. In some such implementations, an implicit mechanism may exist to peg each data stream to a corresponding incoming metadata stream to form an audio object. For example, the metadata stream may contain an identifier for the audio object that it represents, for example, a numerical value from 1 to N. If the rendering device is configured with audio inputs that are also numbered from 1 to N, the rendering tool can automatically assume that an audio object is formed by the metadata chain identified with a numerical value (eg 1) and the audio data received in the first audio input. Similarly, any metadata stream identified as the number 2 can form an object with the audio received on the second audio input channel. In some deployments, audio and metadata can be pre-packaged by the authoring tool to form audio objects, and audio objects can be provided to the rendering tool, for example, sent over a network as TCP/IP packets.
[00089] In alternative deployments, the authoring tool can only send metadata over the network and the rendering tool can receive audio from another source (for example, via a pulse code modulation (PCM) stream, via analogue audio, etc.). In such deployments, the rendering tool can be configured to group audio data and metadata to form audio objects. Audio data can be, for example, received by the logical system via an interface. The interface can be, for example, a network interface, an audio interface (for example, an interface configured for communication through the AES3 standard developed by the Society for Acoustic Engineering and the European Broadcasting Union, also known as AES/EBU , through the Multichannel Digital Audio Interface (MADI) protocol, through analog signals, etc.) or an interface between the logical system and a memory device. In this example, the data received by the renderer includes at least one audio object.
[00090] In block 610, the (x,y) or (x,y,z) coordinates of an audio object position are received. Block 610 may, for example, involve receiving an initial position of the audio object. Block 610 may also involve receiving an indication that a user has positioned or repositioned the audio object, for example, as described above with reference to Figures 5A-5C. The coordinates of the audio object are mapped to a two-dimensional surface at block 615. The two-dimensional surface may be similar to one described above with reference to Figures 5D and 5E, or it may be a different two-dimensional surface. In this example, each point on the x-y plane will be mapped to a single z value, so block 615 involves mapping the x and y coordinates received in block 610 to a z value. In other deployments, different mapping processes and/or coordinate systems may be used. The audio object can be displayed (block 620) at the location (x,y,z) that is determined in block 615. The audio data and metadata, which includes the mapped (x,y,z) location that is determined in block 615, can be stored in block 621. The audio data and metadata can be sent to a rendering tool (block 622). In some deployments, metadata can be sent continuously while some authoring operations are performed, eg while the audio object is positioned, constrained, displayed in GUI 400, etc.
[00091] At block 623, it is determined whether the authoring process will continue. For example, the authoring process may end (block 625) upon receiving input from a user interface that indicates that a user no longer wants to constrain audio object positions to a two-dimensional surface. Otherwise, the authoring process may continue, for example reverting to block 607 or block 610. In some deployments, rendering operations may continue whether the authoring process continues or not. In some deployments, audio objects can be recorded to disk on the authoring platform and then played back from a dedicated sound processor or movie server connected to a sound processor, for example, a sound processor similar to sound processor 210 of Figure 2 for display purposes.
[00092] In some deployments, the rendering tool may be software that runs on a device that is configured to provide authoring functionality. In other deployments, the rendering tool may be provided on another device. The type of communication protocol used for communication between the authoring tool and the rendering tool can vary depending on whether both tools run on the same device or whether they communicate over a network.
[00093] In block 626, the audio data and metadata (including the positions (x,y,z) determined in block 615) are received by the rendering tool. In alternative deployments, audio data and metadata can be received separately and interpreted by the rendering tool as an audio object through an implicit mechanism. As noted above, for example, a metadata stream can contain an audio object identification code (eg 1,2,3, etc.) and can be fixed respectively with the first, second and third audio inputs ( ie, connecting digital or analog audio) into the rendering system to form an audio object that can be rendered to the speakers
[00094] During process 600 rendering operations (and other rendering operations described in this document, the Panning Gain Equations can be applied according to the playback speaker arrangement of a particular playback environment. Consequently, the logical system of the rendering tool can receive playback environment data which comprises an indication of a series of playback speakers in the playback environment and an indication of the location of each playback speaker within the playback environment This data can be received, for example, by accessing a data structure that is stored in memory accessible by the logical system or received through an interface system.
[00095] In this example, the panning gain equations are applied to the (x,y,z) positions to determine gain values (block 628) to apply to the audio data (block 630). In some deployments, audio data that has been level-adjusted in response to the gain values can be played back through the playback speakers, for example, headphone speakers (or other speakers) that are configured for communicating with a logical system of the rendering tool. In some deployments, playback speaker locations may correspond to the speaker zone locations of a virtual playback environment, such as the virtual playback environment 404 described above. The corresponding speaker responses can be displayed on a display device, for example, as shown in Figures 5A to 5C.
[00096] At block 635, it is determined whether the process will continue. For example, the process may terminate (block 640) upon receiving input from a user interface that indicates that a user no longer wants to continue the rendering process. Otherwise, the process may continue, for example, reverting to block 626. If the logical system receives an indication that the user wishes to revert to the corresponding authoring process, process 600 may revert to block 607 or block 610.
[00097] Other deployments may involve imposing various other types of restrictions and creating other types of restriction metadata for audio objects. Figure 6B is a flowchart that highlights an example of a process for mapping an audio object position to a single speaker location. This process may also be referred to in this document as "alignment." At block 655, an indication is received that an audio object position may be aligned to a single speaker location or a single speaker zone. In this example, the indication is that the audio object position will be aligned to a single speaker location when appropriate. The indication can, for example, be received by a logical system of an apparatus that is configured to provide authoring tools. The indication may correspond with input received from a user input device. However, the indication can also correspond with an audio object category (eg as a bullet sound, a vocalization, etc.) and/or an audio object width. Information related to category and/or width can, for example, be received as metadata for the audio object. In such deployments, block 657 may occur before block 655.
[00098] At block 656, audio data is received. The coordinates of an audio object position are received in block 657. In this example, the audio object position is displayed (block 658) according to the coordinates received in block 657. The metadata that includes the audio object coordinates and an alignment indicator, which indicate alignment functionality, are saved in block 659. The audio data and metadata is sent by the authoring tool to a rendering tool (block 660).
[00099] At block 662, it is determined whether the authoring process will continue. For example, the authoring process may end (block 663) upon receiving input from a user interface that indicates that a user no longer wants to align audio object positions to a speaker location. Otherwise, the authoring process can continue, for example reverting to block 665. In some deployments, rendering operations can continue whether the authoring process continues or not.
[000100] The audio data and metadata sent by the authoring tool is received by the rendering tool at block 664. At block 665, it is determined (eg by the logic system) to align the audio object position to a high location -speaker. This determination can be based, at least in part, on the distance between the audio object position and the closest playback speaker location of a playback environment.
[000101] In this example, if aligning the audio object position to a speaker location is determined at block 665, the audio object position will be mapped to a speaker location at block 670, usually the closest to the intended position (x, y, x) received for the audio object. In this case, the gain for audio data played through that speaker location will be 1.0, while the gain for audio data played through other speakers will be zero. In alternative deployments, the audio object position can be mapped to a group of speaker locations in block 670.
[000102] For example, referring again to Fig. 4B, block 670 may involve aligning the position of the audio object to one of the left pendant speakers 470a.
[000103] Alternatively, block 670 may involve an alignment of the position of the audio object to a single speaker and neighboring speakers, for example 1 or 2 neighboring speakers. Consequently, the corresponding metadata can apply to a small group of playback speakers and/or an individual playback speaker.
[000104] However, if it is determined in block 665 that the audio object position will not be aligned to a speaker location, for example, if this can result in a large position discrepancy from the received intended original position for the object, panoramic positioning rules will apply (block 675). Panning rules can be applied according to the position of the audio object as well as other characteristics of the audio object (such as width, volume, etc.).
[000105] The gain data determined in block 675 can be applied to the audio data in block 681 and the result can be saved. In some deployments, the resulting audio data can be played through speakers that are configured to communicate with the logical system. If it is determined at block 685 that process 650 will continue, process 650 may revert to block 664 to continue rendering operations. Alternatively, process 650 may revert to block 655 to resume authoring operations.
[000106] Process 650 can involve various types of smoothing operations. For example, the system logic can be configured to smooth transitions in gains applied to audio data when it transitions from mapping an audio object position from a first single-speaker location to a second single-speaker location. Referring again to Figure 4B, if the position of the audio object was initially mapped to one of the left hanging speakers 470a and later mapped to one of the right rear surround speakers 480b, the logic system can be configured to smooth the transition between speakers so that the audio object does not appear to suddenly "jump" from one speaker (or speaker zone) to another. In some deployments, smoothing can be implemented according to a crossfade rate parameter.
[000107] In some deployments, the logic system can be configured to smooth transitions in gains applied to audio data when transitioning between mapping an audio object position to a single speaker location and applying panning rules for the audio object position. For example, if it was subsequently determined at block 665 that the position of the audio object was moved to a position that was determined to be too far from the nearest speaker, the panning rules for the audio object position might be applied at block 675. However, when transitioning from locking to panning (or vice versa), the logic system can be configured to smooth the transitions in gains applied to audio data. The process can end at block 690, for example, upon receipt of a corresponding input from a user interface.
[000108] Some alternative deployments may involve creating logical constraints. In some cases, for example, a sound mixer may want more explicit control over the set of speakers that is used during a particular panning operation. Some deployments allow a user to generate "one- or two-dimensional logical mappings between sets of speakers and a panoramic positioning interface.
[000109] Figure 7 is a flowchart outlining a process of establishing and using virtual speakers. Figures 8A through 8C show examples of virtual speakers mapped to align corresponding speaker zone and endpoint responses. Referring first to process 700 of Figure 7, an indication is received at block 705 to create virtual speakers. The indication can be received, for example, by a logical system of an authoring apparatus and can correspond to an input received from a user input device.
[000110] At block 710, an indication of a virtual speaker location is received. For example, referring to Figure 8A, a user can use a user input device to position cursor 510 at the position of virtual speaker 805a and to select the same location, for example, by means of a mouse click. At block 715, it is determined (eg according to a user input) which additional virtual speakers will be selected in this example. The process reverts to block 710 and the user selects the position of virtual speaker 805b, shown in Figure 8A, in this example.
[000111] In this case, the user only wants to establish two virtual speaker locations. Therefore, at block 715, it is determined (for example, according to a user input) that no virtual speakers will be selected. A polygonal line 810 can be displayed, as shown in Figure 8A, to connect the virtual speaker positions 805a and 805b. In some deployments, the position of the 505 audio object will be restricted to the 810 polygon line. In some deployments, the position of the 505 audio object may be restricted to a parametric curve. For example, a set of control points can be provided according to user input and a curve-fitting algorithm, such as a spline, can be used to determine the parametric curve. At block 725, an indication of an audio object position along polygonal line 810 is received. In some such implementations, the position will be indicated as a scaled value between zero and one. In block 725, the coordinates (x, y, z) of the audio object and the polygonal line defined by the virtual speakers can be displayed. Audio data and associated metadata, which includes the scaled position and coordinates (x, y, z) of the obtained virtual speakers, can be displayed. (block 727). Here, audio data and metadata can be sent to a rendering tool via an appropriate communication protocol at block 728.
[000112] At block 729, it is determined whether the authoring process will continue. If it does not continue, process 700 may terminate (block 730) or may continue to render operations, according to user input. As noted above, however, in many deployments at least some rendering operations can be performed concurrently with the authoring operations.
[000113] At block 732, the audio data and metadata is received by the rendering tool. At block 735, the gains to be applied to the audio data are computed for each virtual speaker position. Figure 8B shows the speaker responses for virtual speaker position 805a. Figure 8C shows the speaker responses for the 805b virtual speaker position. In this example, as in many other examples described in this document, the speaker responses indicated are for playback speakers that have locations that correspond to the locations shown for the speaker zones of a GUI 400. Here, the speakers - 805a and 805b virtual speakers and the 810 line were positioned on a plane that is not close to the playback speakers which have locations that correspond to speaker zones 8 and 9. Therefore, no gain for these speakers is indicated in Figures 8B or 8C.
[000114] When the user moves the 505 audio object to other positions along line 810, the logic system will calculate a cross attenuation corresponding to these positions (block 740), for example, according to the position parameter in audio object scale. In some deployments, a paired panoramic positioning law (for example, an energy-preserving power or sine law) can be used to blend between the gains to be applied to the audio data for the virtual speaker position 805a and the gains to be applied to the audio data for virtual speaker position 805b.
[000115] At block 742, it can then be determined (eg according to a user input) whether or not to continue process 700. A user can, for example, be introduced (eg via a GUI ) the option to continue rendering operations or revert to authoring operations. If it is determined that process 700 will not continue, the process ends (block 745).
[000116] When panning fast moving audio objects (eg audio objects that correspond to cars, jets, etc.), it can be difficult to initiate a smooth trajectory if audio object positions are selected by a user one at a time. The lack of smoothness in the audio object path can influence the perceived sound image. Consequently, some authoring implementations provided in this document apply a low-pass filter to the position of an audio object in order to smooth out the resulting panning gains. Alternative authoring implementations apply a low pass filter to the gain applied to the audio data.
[000117] Other authoring deployments may allow a user to encourage picking up, pushing, throwing, or similarly interacting with audio objects. Some such implementations may involve the application of simulated physics laws, such as rule sets that are used to describe velocity, acceleration, momentum, kinetic energy, the application of forces, etc.
[000118] Figures 9A to 9C show examples for using a virtual cable to drag an audio object. In Figure 9A, a virtual cable 905 has been formed between the audio object 505 and the cursor 510. In this example, the virtual cable 905 has a virtual spring constant. In some such deployments, the virtual spring constant may be selectable according to user input.
[000119] Figure 9B shows the audio object 505 and the cursor 510 at a subsequent time, after which the user has moved the cursor 510 towards the speaker zone 3. The user may have moved the cursor 510 with the use of a mouse, controller, trackball, gesture detection device, or other type of user input device. Virtual cable 905 has been stretched and audio object 505 has been moved close to speaker zone 8. Audio object 505 is approximately the same size in Figures 9A and 9B, which indicates (in this example) that the elevation of the 505 audio object has not changed substantially.
[000120] Figure 9C shows the audio object 505 and the cursor 510 at a later time, after which the user has moved the cursor around the speaker zone 9. The virtual cable 905 has been stretched even further. The 505 audio object was moved low as indicated by the decrease in size of the 505 audio object. The 505 audio object was moved in a smooth arc. This example illustrates a potential benefit of such deployments, which is the fact that the 505 audio object can be moved on a smoother path than if a user is merely selecting positions for the 505 audio object point-to-point.
[000121] Figure 10A is a flowchart that highlights a process for using a virtual cable to move an audio object. Process 1000 starts with block 1005, in which audio data is received. At block 1007, an indication is received to secure a virtual cable between an audio object and a cursor. The indication may be received by a logical system from an authoring apparatus and may correspond to an input received from a user input device. Referring to Figure 9A, for example, a user may position cursor 510 over audio object 505 and then indicate, via a user input device or a GUI, that virtual cable 905 should be formed between the cursor 510 and audio object 505. Cursor and object position data can be received. (block 1010).
[000122] In this example, cursor velocity and/or acceleration data can be computed by the logic system according to cursor position data as cursor 510 is moved. (block 1015). The position data and/or trajectory data for the audio object 505 can be computed according to the virtual spring constant of the virtual cable 905 and the cursor position, velocity and acceleration data. Some such deployments may involve assigning a virtual mass to audio object 505. (block 1020). For example, if cursor 510 is moved at a relatively constant speed, virtual cable 905 may not stretch and audio object 505 may be pulled together at relatively constant speed. If the cursor 510 accelerates, the virtual cable 905 can be stretched and a corresponding force can be applied to the audio object 505 through the virtual cable 905. There may be a time delay between the acceleration of the cursor 510 and the force applied through the cable. virtual 905. In alternative deployments, the position and/or trajectory of the audio object 505 may be determined differently, for example, without assigning a virtual spring constant to the virtual cable 905, applying friction and/or of inertia to the 505 audio object, etc.
[000123] Distinct trajectory and/or positions of audio object 505 and cursor 510 can be displayed (block 1025). In this example, the logic system samples audio object positions at a time interval (block 1030). In some such deployments, the user can determine the time interval for sampling. Audio object trajectory and/or location metadata, etc. can be saved. (block 1034).
[000124] In block 1036 it is determined whether this authoring mode will continue. The process can continue if the user so desires, for example reverting to block 1005 or to block 1010. Otherwise, process 1000 can terminate (block 1040).
[000125] Figure 10B is a flowchart that highlights an alternative process for using a virtual cable to move an audio object. Figures 10C through 10E show examples of the process highlighted in Figure 10B. Referring first to Fig. 10B, process 1050 begins with block 1055, in which audio data is received. At block 1057, an indication is received to secure a virtual cable between an audio object and a cursor. The indication may be received by a logical system from an authoring apparatus and may correspond to an input received from a user input device. Referring to Figure 10C, for example, a user can position cursor 510 over audio object 505 and then indicate, via a user input device or a GUI, that virtual cable 905 should be formed between the cursor. 510 and the audio object 505.
[000126] Cursor position and audio object data can be received in block 1060. In block 1062, the logic system can receive an indication (via a user input device or a GUI, for example), that the audio object 505 is to be held at an indicated position, for example, a position indicated by cursor 510. At block 1065, the logic device receives an indication that cursor 510 has been moved to a new position, which can be displayed along with the position of the audio object 505 (block 1067). Referring to Figure 10D, for example, cursor 510 has been moved from the left side to the right side of the virtual playback environment 404. However, the audio object 510 is still held in the same position indicated in Figure 10C. As a result, virtual cable 905 has been substantially stretched.
[000127] At block 1069, the logic system receives an indication (via a user input device or a GUI, for example) that the 505 audio object will be released. The logic system can compute the audio object position and/or the resulting trajectory data, which can be displayed (block 1075). The resulting display might look similar to the one shown in Figure 10E, which shows the audio object 505 moving smoothly and quickly through the virtual playback environment 404. The logical system can save the audio object location and/or trajectory metadata in a memory system (block 1080).
[000128] At block 1085, it is determined whether the 1050 authoring process will continue. The process can continue if the logical system receives an indication that the user wants to do the same. For example, process 1050 may continue to revert to block 1055 or block 1060. Otherwise, the authoring tool may send the data and audio metadata to a rendering tool (block 1090), after which which process 1050 can terminate (block 1095).
[000129] In order to optimize the likelihood of perceived motion of an audio object, it may be desirable to let the user of an authoring tool (or a rendering tool) select a subset of the speakers in a playback environment and limit the set of active speakers to the chosen subset. In some deployments, speaker zones and/or speaker zone groups can be designated as active or inactive during an authoring or rendering operation. For example, with reference to Figure 4A, front area speaker zones 405, left area 410, right area 415 and/or top area 420 can be controlled as a group. the speaker zones of a rear area that includes speaker zones 6 and 7 (and, in other deployments, one or more other speaker zones located between speaker zones 6 and 7) also can be controlled as a group. A user interface can be provided to dynamically enable or disable all speakers that correspond to a particular speaker zone or an area that includes a plurality of speaker zones.
[000130] In some deployments, the logical system of an authoring device (or a rendering device) can be configured to create speaker zone restriction metadata based on user input received through an authoring system. user input. Speaker zone restriction metadata can include data to disable selected speaker zones. Some of such deployments will now be described with reference to Figures 11 and 12.
[000131] Figure 11 shows an example of applying a speaker zone restriction in a virtual playback environment. In some such deployments, a user may have the ability to select speaker zones by clicking their representations in a GUI, such as the GUI 400, using a user input device such as a mouse. Here, a user has disabled speaker zones 4 and 5 on the sides of the 404 virtual playback environment. Speaker zones 4 and 5 can match most (or all) of the speakers in a playback environment physical, such as a theater sound system environment. In this example, the user has also constrained the positions of the audio object 505 to positions along line 1105. With most or all speakers along the side walls disabled, panning the screen 150 to the back of the room Virtual playback 404 can be restricted to not use the side speakers. This can create enhanced front-to-back perceived movement for a wide audience area, particularly for audience members who sit close to the playback speakers that correspond to speaker zones 4 and 5.
[000132] In some deployments, speaker zone restrictions can be accomplished through all rerender modes. For example, speaker zone restrictions can be performed in situations when fewer zones are available to render, for example, when rendering to a Dolby Surround 7.1 or 5.1 configuration that exposes only 7 or 5 zones. Speaker zone restrictions can also be performed when more zones are available to render. As such, speaker zone restrictions can also be seen as a way to guide re-rendering, providing an unblinded solution to the traditional "mix up/mix down" process.
[000133] Figure 12 is a flowchart that highlights some examples of applying speaker zone restriction rules. Process 1200 begins with block 1205, in which one or more indications are received to apply loudspeaker zone restriction rules. The indication(s) may be received by a logical system from an authoring or rendering device and may correspond to input received from a user input device. For example, indications can correspond to a user's selection of one or more speaker zones to disable. In some implementations, block 1205 may involve receiving an indication of what type of loudspeaker zone restriction rules should apply, for example, as described below.
[000134] In block 1207, the audio data is received by an authoring tool. Audio object position data can be received (block 1210), for example, according to an input from a user of the authoring tool, and displayed (block 1215). The position data is the coordinates (x, y, z) in this example. Here, the active and inactive speaker zones for the selected speaker zone restriction rules are also displayed in block 1215. In block 1220, the audio data and associated metadata are saved. In this example, the metadata includes the audio object position and speaker zone restriction metadata, which can include a speaker zone identification flag.
[000135] In some deployments, the speaker zone restriction metadata may indicate that a rendering tool should apply panning equations to compute gains in binary fashion, for example, relating all speakers in the zones speakers selected (disabled) as being "off" and all other speaker zones as being "on". The system logic can be configured to create speaker zone restriction metadata that includes data to disable selected speaker zones.
[000136] In alternative deployments, the loudspeaker zone restriction metadata may indicate that the rendering tool can apply panning equations to compute gains in a blended manner that includes some degree of contribution from the loudspeakers of the loudspeakers -speaker disabled. For example, the system logic can be configured to create speaker zone restriction metadata that indicates that the rendering tool should attenuate the selected speaker zones by performing the following operations: compute the first gains that include contributions from selected speaker zones (disabled); compute second gains that do not include contributions from selected speaker zones; and merge the first wins with the second wins. In some deployments, an adjustment can be applied to the first gains and/or the second gains (eg, from a selected minimum value to a selected maximum value) to allow for a range of potential contributions from selected speaker zones .
[000137] In this example, the authoring tool sends the data and audio metadata to a rendering tool at block 1225. The logical system can then determine whether the authoring process will continue (block 1227). The authoring process can continue if the logical system receives an indication that the user wants to do the same. Otherwise, the authoring process may end (block 1229). In some deployments, rendering operations may continue, based on user input.
[000138] The audio objects, including the audio data and metadata created by the authoring tool, are received by the rendering tool at block 1230. Position data for a particular audio object is received at block 1235 in this example . The rendering tool's logic system can apply panning equations to compute gains for the audio data object position, according to the speaker zone restriction rules.
[000139] In block 1245, computed gains are applied to the audio data. The logic system can save the gain, audio object location, and speaker zone restriction metadata in a memory system. In some deployments, audio data may be played through a speaker system. Corresponding speaker responses may be shown on a display in some deployments.
[000140] In block 1248, it is determined whether process 1200 will continue. The process can continue if the logical system receives an indication that the user wants to do the same. For example, the rendering process may continue to revert to block 1230 or block 1235. If an indication is received that a user wants to revert to the corresponding authoring process, the process may revert to block 1207 or block 1210. Otherwise, process 1200 may terminate (block 1250).
[000141] The tasks of placing and rendering audio objects in a three-dimensional virtual playback environment are becoming increasingly difficult. Part of the difficulty relates to challenges in representing the virtual playback environment in a GUI. Some authoring and rendering implementations provided in this document allow a user to switch between a two-dimensional pan in screen space and a three-dimensional pan in screen space. Such functionality can help preserve the accuracy of an audio object's placement while providing a user-friendly GUI.
[000142] Figures 13A and 13B show an example of a GUI that can switch between a two-dimensional view and a three-dimensional view of a virtual playback environment. Referring first to Figure 13A, the GUI 400 depicts an image 1305 on the screen. In this example, image 1305 is that of a saber-toothed tiger. In this top view of the virtual 404 playback environment, a user can readily observe that the audio object 505 is close to speaker zone 1. Elevation can be inferred, for example, by size, color, or some other attribute of audio object 505. However, the position to same relationship of image 1305 may be difficult to determine in this view.
[000143] In this example, the GUI 400 may appear to be dynamically rotated around a geometry axis, such as geometry axis 1310. Figure 13B shows the GUI 1300 after the rotation process. In this view, a user can see image 1305 more clearly and can use information from image 1305 to position audio object 505 more precisely. In this example, the audio object corresponds to a sound in which the saber tiger is looking. Being able to switch between the top view and a screen view of the 404 virtual playback environment allows a user to quickly and accurately select the appropriate elevation for the 505 audio object, using on-screen material information.
[000144] Several other convenient GUIs for authoring and/or rendering are provided in this document. Figures 13C through 13E show combinations of two-dimensional and three-dimensional portrayals of breeding environments. Referring first to Figure 13C, a top view of the virtual playback environment 404 is depicted in a left area of the GUI 1310. The GUI 1310 also includes a three-dimensional depiction 1345 of a virtual (or real) playback environment. Area 1350 of three-dimensional rendering 1345 corresponds to screen 150 of GUI 400. The position of audio object 505, particularly its elevation, can be clearly seen in three-dimensional rendering 1345. In this example, the width of audio object 505 is also shown in three-dimensional portrayal 1345.
[000145] Speaker array 1320 depicts speaker locations 1324 through 1340, each of which may indicate a gain that corresponds to the position of audio object 505 in the 404 virtual playback environment. , the speaker arrangement 1320 can, for example, represent playback speaker locations of an actual playback environment, such as a Dolby Surround 5.1 configuration, a Dolby Surround 7.1 configuration, a Dolby 7.1 configuration plus loudspeaker. suspended speakers, etc. When a logic system receives an indication of a position of audio object 505 in virtual playback environment 404, the logic system can be configured to map that position to gains for speaker locations 1324 through 1340 of the speaker array. 1320, for example, through the pan-amplitude positioning process described above. For example, in Figure 13C, speaker locations 1325, 1335, and 1337 each have a color change that indicates gains that correspond to the position of audio object 505.
[000146] Referring now to Figure 13D, the audio object has been moved to a position behind the screen 150. For example, a user may have moved the audio object 505 by placing a cursor on the audio object 505 in the GUI 400 and dragged it to a new position. This new position is also shown in three-dimensional portrait 1345, which has been rotated to a new orientation. The responses of speaker arrangement 1320 may look substantially the same in Figures 13C and 13D. However, in a real GUI, speaker locations 1325, 1335, and 1337 may have a different appearance (such as a different color or brightness) to indicate corresponding gain differences caused by the new position of the 505 audio object.
[000147] Referring now to Figure 13E, audio object 505 has been quickly moved to a position in the rear right portion of virtual playback environment 404. At the time depicted in Figure 13E, speaker location 1326 responds to the current position of audio object 505 and speaker locations 1325 and 1337 still respond to the previous position of audio object 505.
[000148] Figure 14A is a flowchart that highlights a process of controlling an apparatus to present GUIs such as the same shown in Figures 13C to 13E. Process 1400 begins with block 1405 at which one or more indications are received to display audio object locations, speaker zone locations, and playback speaker locations for a playback environment. Speaker zone locations can correspond to a virtual playback environment and/or an actual playback environment, for example, as shown in Figures 13C to 13E. The indication(s) may be received by a logical system from a rendering and/or authoring device and may correspond to an input received from a user input device. For example, indications might correspond to a user selection of a playback environment setting.
[000149] At block 1407, audio data is received. The data and audio object position width are received in block 1410, for example, according to a user input. At block 1415, the audio object, speaker zone locations, and playback speaker locations are displayed. Audio object position can be displayed in two-dimensional and/or three-dimensional views, for example, as shown in Figures 13C to 13E. Width data can be used not only for audio object rendering, but it can also affect how the audio object is displayed (see depiction of audio object 505 in 3D rendering 1345 of Figures 13C to 13E).
[000150] Audio data and associated metadata can be recorded. (block 1420). At block 1425, the authoring tool sends the audio data and metadata to a rendering tool. The logic system can then determine (block 1427) whether the authoring process will continue. The authoring process can continue (eg, reverting to block 1405) if the logical system receives an indication that the user wishes to do the same. Otherwise, the authoring process may end. (block 1429).
[000151] The audio objects, which includes the audio data and metadata created by the authoring tool, are received by the rendering tool at block 1430. Position data for a particular audio object is received at block 1435 in that example. The rendering tool's logic system can apply panning equations to compute gains for the audio data object position, according to the width metadata.
[000152] In some rendering deployments, the logic system may map the speaker zones to playback environment speakers. For example, the logic system can access a data structure that includes speaker zones and corresponding playback speaker locations. Further details and examples are described below with reference to Figure 14B.
[000153] In some deployments, panoramic positioning equations can be applied, for example, by a logical system, according to position, width and/or other audio object information, such as speaker locations of the reproduction environment (block 1440). In block 1445, the audio data is processed in accordance with the gains that are obtained in block 1440. At least part of the resulting audio data can be stored, if so desired, along with the audio object position data and so on. corresponding metadata received from the authoring tool. Audio data can be played through speakers.
[000154] The logic system can then determine (block 1448) whether process 1400 will continue. Process 1400 can continue if, for example, the logical system receives an indication that the user wishes to do the same. Otherwise, process 1400 may terminate (block 1449).
[000155] Figure 14B is a flowchart that highlights a process of rendering audio objects for a playback environment. Process 1450 starts with block 1455, in which one or more indications are received to render audio objects for a playback environment. The indication(s) may be received by a logical system from a rendering apparatus and may correspond to input received from a user input device. For example, indications might correspond to a user selection of a playback environment setting.
[000156] At block 1457, audio playback data (which includes one or more audio objects and associated metadata) is received. The reproduction environment data may be received at block 1460. The reproduction environment data may include an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the breeding environment. The playback environment can be theater sound system environment, home theater environment, etc. In some deployments, playback environment data may include playback speaker zone layout data that indicate playback speaker zones and playback speaker locations that correspond to speaker zones.
[000157] The playback environment may be displayed at block 1465. In some implementations, the playback environment may be displayed in a manner similar to the speaker arrangement 1320 shown in Figures 13C through 13E.
[000158] In block 1470, audio objects can be rendered into one or more speaker feed signals for the playback environment. In some deployments, metadata associated with audio objects may have been authored in a manner such as the same as described above, such that the metadata may include gain data that corresponds to speaker zones (for example, that corresponds to speaker zones 1 to 9 of the GUI 400). The logic system can map the speaker zones to playback speakers of the playback environment. For example, the logic system can access a data structure, stored in memory, that includes speaker zones and corresponding playback speaker locations. The rendering device can have a variety of such data structures, each of which corresponds to a different speaker configuration. In some deployments, a renderer may have such data structures for a variety of standard playback environment settings, such as a Dolby Surround 5.1 setting, a Dolby Surround 7.1 setting, and/or a Hamasaki 22.2 surround sound setting.
[000159] In some deployments, the metadata for the audio objects may include other information from the authoring process. For example, metadata can include speaker restriction data. Metadata can include information to map an audio object position to a single playback speaker location or a single playback speaker zone. Metadata can include data that constrains an audio object's position to a one-dimensional curve or two-dimensional surface. Metadata can include the trajectory data for an audio object. Metadata can include an identifier for type of content (eg dialog, music or effects).
[000160] Consequently, the rendering process may involve using the metadata, for example, to enforce speaker zone restrictions. In some such deployments, the rendering apparatus may provide a user with the option to modify restrictions indicated by the metadata, for example, to modify speaker restrictions and re-render accordingly. Rendering can involve creating an aggregate gain based on one or more of a desired audio object position, a distance from the desired audio object position to a reference position, a speed of an audio object, or a content type. of audio object. The corresponding responses from the playback speakers can be displayed. (block 1475). In some deployments, the logic system can control speakers to reproduce a sound that matches the results of the rendering process.
[000161] At block 1480, the logic system can determine whether process 1450 will continue. Process 1450 can continue if, for example, the logical system receives an indication that the user wants to do the same. For example, process 1450 may continue to revert to block 1457 or block 1460. Otherwise, process 1450 may terminate (block 1485).
[000162] Apparent font width and spread control are features of some existing surround sound authoring/rendering systems. In this description, the term "scattering" refers to distributing the same signal over multiple speakers to blur the sound image. The term "width" refers to de-correlating the output signals for each channel for apparent width control. Width can be an additional scaled value that controls the amount of decorrelation applied to each speaker power signal.
[000163] Some implementations described in this document provide scattering control oriented on a 3D geometric axis. Such an implementation will now be described with reference to Figures 15A and 15B. Figure 15A shows an example of an audio object and an associated audio object width in a virtual playback environment. Here, GUI 400 indicates an ellipsoid 1505 that extends around the audio object 505, which indicates the audio object's width. Audio object width can be indicated by audio object metadata and/or received according to user input. In this example, the x and y dimensions of the 1505 ellipsoid are different, but in other deployments these dimensions may be the same. The z dimensions of the 1505 ellipsoid are not shown in Figure 15A.
[000164] Figure 15B shows an example of a spread profile that corresponds to the audio object width shown in Figure 15A. The spread can be represented as a three-dimensional vector parameter. In this example, the scatter profile 1507 can be independently controlled along 3 dimensions, for example, according to a user input. The gains along the x and y geometric axes are represented in Figure 15B by the respective height of curves 1510 and 1520. The gain for each 1512 sample is also indicated by the size of the corresponding 1515 circles within the 1507 scattering profile. 1510 are indicated by gray shading in Figure 15B.
[000165] In some deployments, the scattering profile 1507 can be deployed by a separable integral number for each geometric axis. According to some implementations, a minimum spread value can be automatically set as a speaker placement function to avoid timbre discrepancies when panning. Alternatively, or additionally, a minimum spread value can be automatically set as a function of the speed of the panned audio object, so that as an audio object speed increases, an object becomes more spatially spread out, similarly to how quickly moving images on a film appear to blur.
[000166] When using audio object-based audio rendering implementations such as those described in this document, a potentially large amount of audio tracks and accompanying metadata (including, but not limited to, metadata indicating object positions in a three-dimensional space) can be delivered unmixed to the playback environment. A real-time rendering tool can use such metadata and information regarding the playback environment to compute the speaker feed signals to optimize the playback of each audio object.
[000167] When a large amount of audio objects are mixed together with the speaker outputs, an overload can occur either in the digital domain (eg the digital signal may be cut off before analog conversion) or in the analog domain, when the amplified analog signal is played back through the playback speakers. Both cases can result in audible distortion, which is undesirable. An overload in the analog domain can also damage playback speakers.
[000168] Consequently, some implementations described in this document involve a dynamic object "blobbing" in response to a playback loudspeaker overload. When audio objects are rendered with a given scattering profile, in some deployments power can be directed to an increased amount of neighboring playback speakers while maintaining a constant overall power. For example, if the power to the audio object was evenly spread across N playback speakers, it could contribute to each playback speaker output with an l/sqrt(N) gain. This approach provides additional free mixing space and can alleviate or prevent playback speaker distortion such as clipping.
[000169] To use a numerical example, assume that a speaker will cut off if it receives an input greater than 1.0. It is assumed that two objects are meant to be mixed at speaker A, one at 1.0 level and the other at 0.25 level. If no blobbing is observed, the level mixed in speaker A will total 1.25 and clipping occurs. However, if the first object is blobbed with another speaker B, then (according to some implementations) each speaker can receive the object at 0.707, resulting in additional empty space in speaker A for mixing additional objects . The second object can then be safely mixed into speaker A without clipping, as the level mixed to speaker A will be 0.707 + 0.25 = 0.957.
[000170] In some deployments, during the authoring phase each audio object can be downmixed to a subset of the speaker zones (or all speaker zones) with a given mix gain. A dynamic list of all objects contributing to each speaker can therefore be constructed. In some deployments, this list can be sorted by decreasing energy levels, for example, using the product of the original root mean square (RMS) level of the signal multiplied by the mix gain. In other implementations, the list can be sorted according to other criteria, such as the relative importance assigned to the audio object.
[000171] During the rendering process, if an overload is detected for a given playback speaker output, the power of audio objects can be spread over many playback speakers. For example, the energy of audio objects can be spread using a scattering factor or width that is proportional to the amount of overhead and the relative contribution of each audio object to the given playback speaker. If the same audio object contributes to many overhead playback speakers, its spread or width factor may, in some deployments, be additively increased and applied to the next rendered frame of audio data.
[000172] Generally, a hard limiter will cut any value that exceeds a threshold for the limit value. As in the example above, if a speaker receives an object mixed at level 1.25, and can only allow a maximum level of 1.0, the object will be "hard limited" to 1.0. A soft limiter will begin to apply a cap before reaching the absolute threshold in order to provide a softer and audibly pleasurable result. Soft limiters can also use a "look ahead" feature to predict when future cuts might occur in order to gently reduce the gain before when cuts can occur and thereby avoid clipping.
[000173] Various "blobbing" type implementations provided herein can be used in conjunction with a hard or soft limiter to limit audible distortion while avoiding spatial accuracy/sharpness degradation. As opposed to global spread or the use of limiters only, blobbing-type deployments can selectively target loud objects or objects of a given content type. Such deployments can be controlled by the mixer. For example, if the speaker zone restriction metadata for an audio object indicates that a subset of the playback speakers should not be used, the rendering engine can apply the speaker zone restriction rules correspondents in addition to deploying a blobbing method.
[000174] Figure 16 is a flowchart that highlights a process of blobbing the audio objects. Process 1600 begins with block 1605, where one or more indications are received to activate an audio object blobbing functionality. The indication(s) may be received by a logical system from a rendering apparatus and may correspond to an input received from a user input device. In some deployments, referrals might include a user selection of a replay environment configuration. In alternate deployments, the user can previously select a replay environment setting.
[000175] At block 1607, audio playback data (including one or more audio objects and associated metadata) is received. In some deployments, the metadata might include speaker zone restriction metadata, for example, as described above. In this example, audio object position, time and spread data is analyzed from the audio playback data (or otherwise received, for example, via input from a user interface) in block 1610.
[000176] Playback speaker responses are determined for the playback environment setup by applying panning equations to the audio object data, for example, as described above (block 1612). At block 1615, the playback speaker responses and audio object position are displayed (block 1615). Playback speaker responses can also be played through speakers that are configured to communicate with the logic system.
[000177] At block 1620, the logic system determines whether an overload is detected or not for any playback speaker in the playback environment. If so, audio object blobbling rules such as those described above may apply until no overload is detected (block 1625). The output audio data in block 1630 can be saved, if desired, and can be output to the playback speakers.
[000178] At block 1635, the logic system can determine whether process 1600 will continue or not. Process 1600 can continue if, for example, the logical system receives an indication that the user wants to continue. For example, process 1600 may continue by reverting to block 1607 or block 1610. Otherwise, process 1600 may terminate (block 1640).
[000179] Some implementations provide extended panning gain equations that can be used to form an image of an audio object's position in three-dimensional space. Some examples will now be described with reference to Figures 17A and 17B. Figures 17A and 17B show examples of an audio object position in a three-dimensional virtual playback environment. Referring first to Figure 17A, the position of the audio object 505 can be seen in the virtual playback environment 404. In this example, speaker zones 1 to 7 lie in one plane and speaker zones 8 and 9 are on another plane, as shown in Figure 17B. However, loudspeaker zone numbers, plans, etc. are merely given as an example; the concepts described in this document can be extended to different numbers of speaker zones (or individual speakers) and more than two elevation planes.
[000180] In this example, an elevation parameter "z," which can be in the range zero to 1, maps the position of an audio object on the elevation planes. In this example, the value z = 0 corresponds to the base plane that includes speaker zones 1 through 7, while the value z = 1 corresponds to the overload plane that includes speaker zones 8 and 9. values of e between zero and 1 correspond to a blend between a sound image generated using only the speakers in the base plane and a sound image generated using only the speakers in the overhead plane.
[000181] In the example shown in Figure 17B, the elevation parameter for audio object 505 has a value of 0.6. Consequently, in a deployment, a first sound image can be generated using panoramic positioning equations for the base plane, according to the (x,y) coordinates of the audio object 505 in the base plane. A second sound image can be generated using panoramic positioning equations for the overhead plane, according to the (x,y) coordinates of the 505 audio object in the overhead plane. A resulting sound image can be produced by combining the first sound image with the second sound image, according to the proximity of the audio object 505 to each plane. An energy conservation function or z-elevation amplitude can be applied. For example, assuming z can be in the range zero to one, the gain values of the first sound picture can be multiplied by Cosine(z*7i/2) and the gain values of the second sound picture can be multiplied by Sine(z*7i/2), so that the sum of squares is 1 (preservation of energy).
[000182] Other implementations described in this document may involve computing gains based on two or more panning techniques and creating an aggregate gain based on one or more parameters. Parameters can include one or more of the following: desired audio object position; distance from the desired audio object position to a reference position; the speed or speed of the audio object; or audio object content type.
[000183] Some of these deployments will now be described with reference to Figures 18 et seq. Figure 18 shows examples of zones that correspond to different panning modes. The sizes, shapes and extent of these zones are merely formulated as an example. In this example, near-field panning methods are applied to audio objects that lie in zone 1805 and far-of-field panning methods are applied to audio objects that lie in zone 1815, outside of zone 1810.
[000184] Figures 19A to 19D show examples of application of near-field and far-of-field panning techniques to audio objects in different locations. Referring first to Figure 19A, the audio object is substantially outside the 1900 virtual playback environment. This location corresponds to zone 1815 of Figure 18. Therefore, one or more far-field panning methods will be applied in this instance. In some deployments, far-field panning methods may be based on vector-based amplitude panning equations (VBAP) that are known to those of ordinary skill in the art. For example, far-field panning methods can be based on the VBAP equations described in Section 2.3, page 4 of V. Pulkki, Compensating Displacement of Amplitude- Panned Virtual Sources (AES International Conference on Virtual, Synthetic and Entertainment Audio) , which is incorporated by way of reference. In alternative deployments, other methods can be used for panning audio objects close to the field and far from the field, for example, methods that involve spherical wave synthesis or corresponding acoustic planes. D. de Vries, Wave Field Synthesis (AES Monograph 1999), which is incorporated by reference, describes relevant methods.
[000185] Referring now to Figure 19B, the audio object is within the 1900 virtual playback environment. This location corresponds to zone 1805 of Figure 18. Therefore, one or more near-field panning methods will be applied to this instance . Some of these near-field panning methods will utilize a number of speaker zones that encompass the 505 audio object in the 1900 virtual playback environment.
[000186] In some deployments, the near-field panning method may involve "dual balance" panning and combining two sets of gains. In the example depicted in Figure 19B, the first set of gains corresponds to a front/rear balance between two sets of speaker zones that encompass positions of the audio object 505 along the y-axis. The corresponding answers involve all speaker zones of the 1900 virtual playback environment, with the exception of speaker zones 1915 and 1960.
[000187] In the example depicted in Figure 19C, the second set of gains corresponds to a left/right balance between two sets of speaker zones that encompass audio object 505 positions along the x-axis. The corresponding responses involve speaker zones 1905 to 1925. Figure 19D indicates the result of combining the responses indicated in Figures 19B and 19C.
[000188] It may be desirable to merge between different panning modes as an audio object enters or leaves the 1900 virtual playback environment. Consequently, a merge of computed gains according to near-field panning methods and panning methods far from field is applied to audio objects that lie in the 1810 zone (see Figure 18). In some deployments, a pair panning law (eg an energy conservation sine or power law) can be used to blend between the gains computed according to near-field panning methods and far panning methods from Camp. In alternative deployments, the pair panning law can be one of amplitude preservation rather than energy preservation, such that the sum equals one rather than the sum of squares equals one. It is possible to merge the resulting processed signals, for example, to process the audio signal using both panning methods independently and to crossfade the two resulting audio signals.
[000189] It may be desirable to provide a mechanism that allows the content creator and/or content player to easily fine-tune different new renderings for a given authoring trajectory. In the context of film mixing, the concept of screen-to-room energy balance is considered to be important. In some instances, an automatic re-rendering of a given sound path (or 'panning position) will result in a different screen-to-room balance depending on the number of playback speakers in the playback environment. Under some deployments, screen-to-room snapping can be controlled based on metadata created during an authoring process. As per alternative implementations, screen-to-room snapping can be controlled solely on the rendering side (ie, under content player control), and not in metadata responses.
[000190] Accordingly, some deployments described in this document provide one or more ways to control screen-to-room adjustment. In some of these deployments, screen-to-room snapping can be deployed as a resize operation. For example, the resize operation might involve the original intended trajectory of an audio object within a front to back direction and/or a resize of the speaker positions used in the renderer to determine panning gains. In some of these deployments, the screen-to-room adjustment control can be a variable value between zero and a maximum value (for example, one). Variation can, for example, be controllable with a GUI, a virtual or physical slider, a button, etc.
[000191] Alternatively, or additionally, the Screen Fit Control for Room can be implemented using some form of speaker area restriction. Figure 20 indicates speaker zones of a playback environment that can be used in a screen-to-room adjustment control process. In this example, the front speaker area 2005 and the rear speaker area 2010 (or 2015) can be established. Screen fit for room can be adjusted as a function of selected speaker areas. In some of these deployments, a screen-to-room fit can be deployed as a resize operation between the front speaker area 2005 and the rear speaker area 2010 (or 2015). In alternative deployments, the screen-to-room adjustment can be deployed in a binary manner, for example, by allowing a user to select a front side adjustment, a back side adjustment, or no adjustment. The adjustment settings for each case can correspond with predetermined (and usually non-zero) adjustment levels for the front speaker area 2005 and the rear speaker area 2010 (or 2015). In essence, such deployments can provide three presets for the screen-to-room fit control instead of (or in addition to) a resize operation.
[000192] According to some of these implementations, two additional logical speaker zones can be created in an authoring GUI (eg 400) by separating the side walls into a front wall and a back wall. In some deployments, the two additional logical speaker zones correspond to the left wall/left surround and right wall/right surround sound areas of the renderer. Depending on a user's selection of which of these two logical speaker zones are active, the rendering tool can apply predetermined scaling factors (for example, as described above) when rendering to Dolby 5.1 or Dolby 7.1 configurations. the rendering tool can also apply such predetermined scaling factors during rendering for playback environments that do not support the definition of these two extra logical zones, for example due to the fact that your physical speaker settings have no more than one physical speaker on the side wall.
[000193] Figure 21 is a block diagram that provides examples of components of an authoring and/or rendering apparatus. In that example, device 2100 includes a 2105 interface system. The 2105 interface system may include a network interface, such as a wireless network interface. Alternatively, or in addition, the 2105 interface system may include a universal serial bus (USB) interface or other interface.
[000194] Device 2100 includes a 2110 logic system. The 2110 logic system may include a processor, such as a processor such as a multi-chip or general purpose processor. Logic system 2110 may include a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, transistor or discrete gate logic, or components of discrete hardware, or combinations thereof. The 2110 logic system can be configured to control the other components of the 2100 device. Although no interfaces between the components of the 2100 device are shown in Figure 21, the 2110 logic system can be configured with interfaces to communicate with the other components. The other components may or may not be configured to communicate with each other, as appropriate.
[000195] The 2110 logic system can be configured to perform audio authoring and/or rendering functionality, including, but not limited to, the audio authoring and/or rendering functionality types described in this document. In some of these deployments, the logical 2110 system can be configured to operate (at least in part) in accordance with software stored in one or more non-transient media. The non-transient means may include memory associated with logical system 2110, such as random access memory (RAM) and/or read-only memory (ROM). The non-transient media may include memories from memory system 2115. Memory system 2115 may include one or more suitable types of non-transient storage media, such as flash memory, a hard disk, etc.
[000196] The 2130 display system may include one or more suitable types of display depending on the manifestation of the 2100 device. For example, the 2130 display system may include a liquid crystal display, a plasma display, a bistable display , etc.
[000197] The 2135 user input system may include one or more devices configured to accept input from a user. In some implementations, the 2135 user input system may include a touch-sensitive screen that overlays a display of the 2130 display system. The 2135 user input system may include a mouse, a trackball, a gesture detection system, a control, one or more GUIs and/or menus presented on the 2130 display system, buttons, a keyboard, switches, etc. In some deployments, the 2135 user input system may include microphone 2125: a user may provide voice commands to device 2100 via microphone 2125. The logic system may be configured for speech recognition and to control at least some 2100 device operations in accordance with such voice commands.
[000198] The 2140 power system may include one or more suitable energy storage devices, such as a nickel-cadmium battery or a lithium-ion battery. Power system 2140 can be configured to receive power from an electrical output.
[000199] Figure 22A is a block diagram that represents some components that can be used to create audio content. The 2200 system can, for example, be used for creating audio content in mixing studios and/or dubbing stages. In this example, the 2200 system includes a 2205 audio and metadata authoring tool and a 2210 rendering tool. In this deployment, the 2205 metadata and audio authoring tool and 2210 rendering tool include 2207 and 2212 audio connection interfaces , respectively, which can be configured for communication via AES/EBU, MADI, analog communication, etc. the audio and metadata authoring tool 2205 and the rendering tool 2210 include network interfaces 2209 and 2217, respectively, which can be configured to send and receive metadata over TCP/IP or any other suitable protocol. The 2220 interface is configured to send audio data to speakers.
[000200] The 2200 system may, for example, include an existing authoring system, such as a Pro ToolsTM system, that runs a metadata creation tool (ie a pan positioner as described in this document) as a plugin . The pan positioner can be run on a standalone system (eg a PC or a mixing console) connected to the 2210 render tool, or it can run on the same physical device as the 2210 render tool. and renderer can use a local connection, for example, through shared memory. The panoramic positioner GUI can be remote also on a tablet type device, a laptop type computer, etc. rendering tool 2210 may comprise a rendering system that includes a sound processor that is configured to run rendering software. The rendering system can include, for example, a personal computer, a laptop computer, etc., which includes interfaces for audio input/output and an appropriate logic system.
[000201] Figure 22B is a block diagram that represents some components that can be used for audio reproduction in a reproduction environment (for example, a cinema). The 2250 system includes a 2255 cinema server and a 2260 rendering system in this example. The 2255 cinema server and the 2260 rendering system include network interfaces 2257 and 2262, respectively, which can be configured to send and receive audio objects via TCP/IP or any other suitable protocol. The 2264 interface is configured to output audio data to speakers.
[000202] Various modifications to the implantations described in this description may be readily apparent to those of ordinary skill in the art. The general principles defined in this document can be applied to other implementations without departing from the spirit or scope of this description. Thus, the embodiments are not intended to be limited to the implementations shown herein, but must conform to the broadest scope consistent with this description, the principles and features of the invention disclosed herein.

权利要求:
Claims (11)
[0001]
1. A method comprising the steps of: receiving audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects; receiving playback environment data comprising an indication of a number of playback speakers in the playback environment and an indication of the location of each playback speaker within the playback environment; and rendering the audio objects into one or more speaker power signals by applying a pan-amplitude positioning process to each audio object, where the pan-amplitude positioning process is based, at least in part, on the metadata associated with each audio object and the location of each playback speaker within the playback environment, and where each speaker power signal corresponds to at least one of the playback speakers within the playback environment ; characterized by the fact that the metadata associated with each audio object includes audio object coordinates indicating the intended playback position of the audio object within the playback environment and an alignment indicator indicating whether the pan amplitude positioning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.
[0002]
2. Method according to claim 1, characterized in that: the alignment indicator indicates that the panoramic amplitude positioning process must render the audio object into a single speaker power signal; and the pan-amplitude positioning process renders the audio object into a speaker feed signal corresponding to the playback speaker closest to the intended playback position of the audio object.
[0003]
3. Method according to claim 1, characterized by the fact that: the alignment indicator indicates that the panoramic amplitude positioning process must render the audio object into a single speaker power signal; a distance between the intended playback position of the audio object and the closest playback speaker to the intended playback position of the audio object exceeds a limit; and the amplitude panning process replaces the alignment indicator and applies panning rules to render the audio object into a plurality of speaker feed signals.
[0004]
4. Method, according to claim 2, characterized by the fact that: metadata are time-varying; the audio object coordinates indicating the intended playback position of the audio object within the playback environment differ in a first instant and a second instant; at the first instant the playback speaker closest to the intended playback position of the audio object corresponds to a first playback speaker; at the second time the playback speaker closest to the intended playback position of the audio object corresponds to a second playback speaker; and the pan-amplitude positioning process smoothly transitions between rendering the audio object into a first speaker power signal corresponding to the first playback speaker and rendering the audio object into a second speaker power signal corresponding to the second playback speaker.
[0005]
5. Method, according to claim 1, characterized by the fact that: the metadata are time-varying; at first the alignment indicator indicates that the pan-amplitude positioning process should render the audio object into a single speaker power signal; in a second instant the alignment indicator indicates that the amplitude panning process must apply panning rules to render the audio object into a plurality of speaker feed signals; and the amplitude panning process smoothly transitions between rendering the audio object into a speaker feed signal corresponding to the closest playback speaker to the intended playback position of the audio object and applying panning rules to render the audio object into a plurality of speaker power signals.
[0006]
6. Apparatus comprising: an interface system (2105); and a logical system (2110) configured to: receive, via the interface system (2105), audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects; receiving, via the interface system (2105), reproduction environment data comprising an indication of a number of reproduction speakers in the reproduction environment and an indication of the location of each reproduction speaker within the reproduction environment ; and rendering the audio objects into one or more speaker power signals by applying a pan-amplitude positioning process to each audio object, where the pan-amplitude positioning process is based, at least in part, on the metadata associated with each audio object and the location of each playback speaker within the playback environment, and where each speaker power signal corresponds to at least one of the playback speakers within the playback environment ; characterized by the fact that the metadata associated with each audio object includes audio object coordinates indicating the intended playback position of the audio object within the playback environment and an alignment indicator indicating whether the pan amplitude positioning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.
[0007]
7. Apparatus according to claim 6, characterized in that: the alignment indicator indicates that the panoramic amplitude positioning process must render the audio object into a single speaker power signal; and the pan-amplitude positioning process renders the audio object into a speaker feed signal corresponding to the playback speaker closest to the intended playback position of the audio object.
[0008]
8. Apparatus according to claim 6, characterized in that: the alignment indicator indicates that the panoramic amplitude positioning process must render the audio object into a single speaker power signal; a distance between the intended playback position of the audio object and the closest playback speaker to the intended playback position of the audio object exceeds a limit; and the amplitude panning process replaces the alignment indicator and applies panning rules to render the audio object into a plurality of speaker feed signals.
[0009]
9. Apparatus, according to claim 7, characterized by the fact that: the metadata are time-varying; the audio object coordinates indicating the intended playback position of the audio object within the playback environment differ in a first instant and a second instant; at the first instant the playback speaker closest to the intended playback position of the audio object corresponds to a first playback speaker; at the second time the playback speaker closest to the intended playback position of the audio object corresponds to a second playback speaker; and the pan-amplitude positioning process smoothly transitions between rendering the audio object into a first speaker power signal corresponding to the first playback speaker and rendering the audio object into a second speaker power signal corresponding to the second playback speaker.
[0010]
10. Apparatus, according to claim 6, characterized in that: the metadata are time-varying; at first the alignment indicator indicates that the pan-amplitude positioning process should render the audio object into a single speaker power signal; in a second instant the alignment indicator indicates that the amplitude panning process must apply panning rules to render the audio object into a plurality of speaker feed signals; and the amplitude panning process smoothly transitions between rendering the audio object into a speaker feed signal corresponding to the closest playback speaker to the intended playback position of the audio object and applying panning rules to render the audio object into a plurality of speaker power signals.
[0011]
11. A non-transient medium having a method stored therein, the method performing the following steps: receiving audio reproduction data comprising one or more audio objects and metadata associated with each of the one or more audio objects; receiving playback environment data comprising an indication of a number of playback speakers in the playback environment and an indication of the location of each playback speaker within the playback environment; and rendering the audio objects into one or more speaker power signals by applying a pan-amplitude positioning process to each audio object, where the pan-amplitude positioning process is based, at least in part, on the metadata associated with each audio object and the location of each playback speaker within the playback environment, and where each speaker power signal corresponds to at least one of the playback speakers within the playback environment ; characterized by the fact that the metadata associated with each audio object includes audio object coordinates indicating the intended playback position of the audio object within the playback environment and an alignment indicator indicating whether the pan amplitude positioning process should render the audio object into a single speaker feed signal or apply panning rules to render the audio object into a plurality of speaker feed signals.

类似技术:

公开号 | 公开日 | 专利标题

BR112013033835B1|2021-09-08|METHOD, APPARATUS AND NON- TRANSITIONAL ENVIRONMENT FOR IMPROVED AUDIO AUTHORSHIP AND RENDING IN 3D

WO2017087564A1|2017-05-26|System and method for rendering an audio program

AU2021200437B2|2022-03-10|System and tools for enhanced 3D audio authoring and rendering

同族专利:

公开号 | 公开日

US20140119581A1|2014-05-01|

EP3913931A4|2021-11-24|

JP6023860B2|2016-11-09|

CN106060757B|2018-11-13|

TW202106050A|2021-02-01|

JP2019193302A|2019-10-31|

JP2016007048A|2016-01-14|

KR20190026983A|2019-03-13|

AU2016203136A1|2016-06-02|

IL254726D0|2017-11-30|

CA3083753C|2021-02-02|

HK1225550A1|2017-09-08|

CA2837894A1|2013-01-10|

TWI701952B|2020-08-11|

JP6556278B2|2019-08-07|

JP2018088713A|2018-06-07|

RU2018130360A3|2021-10-20|

KR102156311B1|2020-09-15|

TWI666944B|2019-07-21|

US20160037280A1|2016-02-04|

CA3025104A1|2013-01-10|

KR101958227B1|2019-03-14|

TW201933887A|2019-08-16|

IL251224A|2017-11-30|

US20170086007A1|2017-03-23|

IL265721D0|2019-05-30|

JP5798247B2|2015-10-21|

MY181629A|2020-12-30|

MX337790B|2016-03-18|

CN103650535B|2016-07-06|

AU2019257459B2|2020-10-22|

KR20150018645A|2015-02-23|

EP3913931A1|2021-11-24|

TW201811071A|2018-03-16|

CA3025104C|2020-07-07|

CA2837894C|2019-01-15|

US10609506B2|2020-03-31|

US11057731B2|2021-07-06|

US10244343B2|2019-03-26|

RU2672130C2|2018-11-12|

JP2017041897A|2017-02-23|

KR20190134854A|2019-12-04|

JP2020065310A|2020-04-23|

CA3083753A1|2013-01-10|

KR20180032690A|2018-03-30|

JP6297656B2|2018-03-20|

TWI548290B|2016-09-01|

US9838826B2|2017-12-05|

BR112013033835A2|2017-02-21|

EP2727381B1|2022-01-26|

MX349029B|2017-07-07|

WO2013006330A3|2013-07-11|

KR101547467B1|2015-08-26|

AU2019257459A1|2019-11-21|

KR20140017684A|2014-02-11|

AU2018204167A1|2018-06-28|

EP2727381A2|2014-05-07|

JP6655748B2|2020-02-26|

IL230047A|2017-05-29|

TW201316791A|2013-04-16|

CL2013003745A1|2014-11-21|

AU2012279349B2|2016-02-18|

IL254726A|2018-05-31|

US20200296535A1|2020-09-17|

IL251224D0|2017-05-29|

AU2021200437A1|2021-02-25|

JP2021193842A|2021-12-23|

AU2018204167B2|2019-08-29|

AR086774A1|2014-01-22|

TWI607654B|2017-12-01|

US20190158974A1|2019-05-23|

WO2013006330A2|2013-01-10|

CA3134353A1|2013-01-10|

CN106060757A|2016-10-26|

JP6952813B2|2021-10-27|

US9549275B2|2017-01-17|

MX2013014273A|2014-03-21|

RU2018130360A|2020-02-21|

CA3104225A1|2013-01-10|

US9204236B2|2015-12-01|

CA3104225C|2021-10-12|

KR20200108108A|2020-09-16|

RU2015109613A|2015-09-27|

CN103650535A|2014-03-19|

US20210400421A1|2021-12-23|

RU2015109613A3|2018-06-27|

KR101843834B1|2018-03-30|

US20180077515A1|2018-03-15|

IL265721A|2022-03-01|

IL258969D0|2018-06-28|

RU2554523C1|2015-06-27|

KR102052539B1|2019-12-05|

AU2016203136B2|2018-03-29|

TW201631992A|2016-09-01|

JP2014520491A|2014-08-21|

US20200045495A9|2020-02-06|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB9307934D0|1993-04-16|1993-06-02|Solid State Logic Ltd|Mixing audio signals|

GB2294854B|1994-11-03|1999-06-30|Solid State Logic Ltd|Audio signal processing|

US6072878A|1997-09-24|2000-06-06|Sonic Solutions|Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics|

GB2337676B|1998-05-22|2003-02-26|Central Research Lab Ltd|Method of modifying a filter for implementing a head-related transfer function|

GB2342830B|1998-10-15|2002-10-30|Central Research Lab Ltd|A method of synthesising a three dimensional sound-field|

US6442277B1|1998-12-22|2002-08-27|Texas Instruments Incorporated|Method and apparatus for loudspeaker presentation for positional 3D sound|

US6507658B1|1999-01-27|2003-01-14|Kind Of Loud Technologies, Llc|Surround sound panner|

US7660424B2|2001-02-07|2010-02-09|Dolby Laboratories Licensing Corporation|Audio channel spatial translation|

CN100539737C|2001-03-27|2009-09-09|1...有限公司|Produce the method and apparatus of sound field|

SE0202159D0|2001-07-10|2002-07-09|Coding Technologies Sweden Ab|Efficientand scalable parametric stereo coding for low bitrate applications|

US7558393B2|2003-03-18|2009-07-07|Miller Iii Robert E|System and method for compatible 2D/3D surround sound reproduction|

JP3785154B2|2003-04-17|2006-06-14|パイオニア株式会社|Information recording apparatus, information reproducing apparatus, and information recording medium|

DE10321980B4|2003-05-15|2005-10-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for calculating a discrete value of a component in a loudspeaker signal|

DE10344638A1|2003-08-04|2005-03-10|Fraunhofer Ges Forschung|Generation, storage or processing device and method for representation of audio scene involves use of audio signal processing circuit and display device and may use film soundtrack|

JP2005094271A|2003-09-16|2005-04-07|Nippon Hoso Kyokai <Nhk>|Virtual space sound reproducing program and device|

SE0400997D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Efficient coding or multi-channel audio|

US8363865B1|2004-05-24|2013-01-29|Heather Bottum|Multiple channel sound system using multi-speaker arrays|

JP2006005024A|2004-06-15|2006-01-05|Sony Corp|Substrate treatment apparatus and substrate moving apparatus|

JP2006050241A|2004-08-04|2006-02-16|Matsushita Electric Ind Co Ltd|Decoder|

KR100608002B1|2004-08-26|2006-08-02|삼성전자주식회사|Method and apparatus for reproducing virtual sound|

JP2008512898A|2004-09-03|2008-04-24|パーカーツハコ|Method and apparatus for generating pseudo three-dimensional acoustic space by recorded sound|

WO2006050353A2|2004-10-28|2006-05-11|Verax Technologies Inc.|A system and method for generating sound events|

US20070291035A1|2004-11-30|2007-12-20|Vesely Michael A|Horizontal Perspective Representation|

US7774707B2|2004-12-01|2010-08-10|Creative Technology Ltd|Method and apparatus for enabling a user to amend an audio file|

US7928311B2|2004-12-01|2011-04-19|Creative Technology Ltd|System and method for forming and rendering 3D MIDI messages|

JP3734823B1|2005-01-26|2006-01-11|任天堂株式会社|GAME PROGRAM AND GAME DEVICE|

DE102005008343A1|2005-02-23|2006-09-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing data in a multi-renderer system|

JP4859925B2|2005-08-30|2012-01-25|エルジーエレクトロニクスインコーポレイティド|Audio signal decoding method and apparatus|

JP5586950B2|2006-05-19|2014-09-10|韓國電子通信研究院|Object-based three-dimensional audio service system and method using preset audio scene|

AT527833T|2006-05-04|2011-10-15|Lg Electronics Inc|IMPROVEMENT OF STEREO AUDIO SIGNALS BY RE-MIXING|

WO2008039043A1|2006-09-29|2008-04-03|Lg Electronics Inc.|Methods and apparatuses for encoding and decoding object-based audio signals|

JP4257862B2|2006-10-06|2009-04-22|パナソニック株式会社|Speech decoder|

MX2009003564A|2006-10-16|2009-05-28|Fraunhofer Ges Forschung|Apparatus and method for multi -channel parameter transformation.|

US20080253592A1|2007-04-13|2008-10-16|Christopher Sanders|User interface for multi-channel sound panner|

US20080253577A1|2007-04-13|2008-10-16|Apple Inc.|Multi-channel sound panner|

WO2008135049A1|2007-05-07|2008-11-13|Aalborg Universitet|Spatial sound reproduction system with loudspeakers|

JP2008301200A|2007-05-31|2008-12-11|Nec Electronics Corp|Sound processor|

TW200921643A|2007-06-27|2009-05-16|Koninkl Philips Electronics Nv|A method of merging at least two input object-oriented audio parameter streams into an output object-oriented audio parameter stream|

JP4530007B2|2007-08-02|2010-08-25|ヤマハ株式会社|Sound field control device|

EP2094032A1|2008-02-19|2009-08-26|Deutsche Thomson OHG|Audio signal, method and apparatus for encoding or transmitting the same and method and apparatus for processing the same|

JP2009207780A|2008-03-06|2009-09-17|Konami Digital Entertainment Co Ltd|Game program, game machine and game control method|

EP2154911A1|2008-08-13|2010-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|An apparatus for determining a spatial output multi-channel audio signal|

WO2010019750A1|2008-08-14|2010-02-18|Dolby Laboratories Licensing Corporation|Audio signal transformatting|

US20100098258A1|2008-10-22|2010-04-22|Karl Ola Thorn|System and method for generating multichannel audio with a portable electronic device|

KR101542233B1|2008-11-04|2015-08-05|삼성전자 주식회사|Apparatus for positioning virtual sound sources methods for selecting loudspeaker set and methods for reproducing virtual sound sources|

WO2010058546A1|2008-11-18|2010-05-27|パナソニック株式会社|Reproduction device, reproduction method, and program for stereoscopic reproduction|

JP2010252220A|2009-04-20|2010-11-04|Nippon Hoso Kyokai <Nhk>|Three-dimensional acoustic panning apparatus and program therefor|

JP4918628B2|2009-06-30|2012-04-18|新東ホールディングス株式会社|Ion generator and ion generator|

ES2793958T3|2009-08-14|2020-11-17|Dts Llc|System to adaptively transmit audio objects|

JP2011066868A|2009-08-18|2011-03-31|Victor Co Of Japan Ltd|Audio signal encoding method, encoding device, decoding method, and decoding device|

EP2309781A3|2009-09-23|2013-12-18|Iosono GmbH|Apparatus and method for calculating filter coefficients for a predefined loudspeaker arrangement|

WO2011054876A1|2009-11-04|2011-05-12|Fraunhofer-Gesellschaft Zur Förderungder Angewandten Forschung E.V.|Apparatus and method for calculating driving coefficients for loudspeakers of a loudspeaker arrangement for an audio signal associated with a virtual source|

CN104822036B|2010-03-23|2018-03-30|杜比实验室特许公司|The technology of audio is perceived for localization|

PL2553947T3|2010-03-26|2014-08-29|Thomson Licensing|Method and device for decoding an audio soundfield representation for audio playback|

CN102860041A|2010-04-26|2013-01-02|剑桥机电有限公司|Loudspeakers with position tracking|

WO2011152044A1|2010-05-31|2011-12-08|パナソニック株式会社|Sound-generating device|

JP5826996B2|2010-08-30|2015-12-02|日本放送協会|Acoustic signal conversion device and program thereof, and three-dimensional acoustic panning device and program thereof|

WO2012122397A1|2011-03-09|2012-09-13|Srs Labs, Inc.|System for dynamically creating and rendering audio objects|

RU2672130C2|2011-07-01|2018-11-12|Долби Лабораторис Лайсэнзин Корпорейшн|System and instrumental means for improved authoring and representation of three-dimensional audio data|

RS1332U|2013-04-24|2013-08-30|Tomislav Stanojević|Total surround sound system with floor loudspeakers|JPH0812470B2|1987-07-10|1996-02-07|三井東圧化学株式会社|Method for producing resin composition for electrophotographic toner|

KR101901908B1|2011-07-29|2018-11-05|삼성전자주식회사|Method for processing audio signal and apparatus for processing audio signal thereof|

KR101744361B1|2012-01-04|2017-06-09|한국전자통신연구원|Apparatus and method for editing the multi-channel audio signal|

US9264840B2|2012-05-24|2016-02-16|International Business Machines Corporation|Multi-dimensional audio transformations and crossfading|

US9622014B2|2012-06-19|2017-04-11|Dolby Laboratories Licensing Corporation|Rendering and playback of spatial audio using channel-based audio systems|

EP2898706B1|2012-09-24|2016-06-22|Barco N.V.|Method for controlling a three-dimensional multi-layer speaker arrangement and apparatus for playing back three-dimensional sound in an audience area|

US10158962B2|2012-09-24|2018-12-18|Barco Nv|Method for controlling a three-dimensional multi-layer speaker arrangement and apparatus for playing back three-dimensional sound in an audience area|

RU2612997C2|2012-12-27|2017-03-14|Николай Лазаревич Быченко|Method of sound controlling for auditorium|

JP6174326B2|2013-01-23|2017-08-02|日本放送協会|Acoustic signal generating device and acoustic signal reproducing device|

EP2974384B1|2013-03-12|2017-08-30|Dolby Laboratories Licensing Corporation|Method of rendering one or more captured audio soundfields to a listener|

EP2979467B1|2013-03-28|2019-12-18|Dolby Laboratories Licensing Corporation|Rendering audio using speakers organized as a mesh of arbitrary n-gons|

EP3282716B1|2013-03-28|2019-11-20|Dolby Laboratories Licensing Corporation|Rendering of audio objects with apparent size to arbitrary loudspeaker layouts|

US9786286B2|2013-03-29|2017-10-10|Dolby Laboratories Licensing Corporation|Methods and apparatuses for generating and using low-resolution preview tracks with high-quality encoded object and multichannel audio signals|

US20160050508A1|2013-04-05|2016-02-18|William Gebbens REDMANN|Method for managing reverberant field for immersive audio|

EP2984763B1|2013-04-11|2018-02-21|Nuance Communications, Inc.|System for automatic speech recognition and audio entertainment|

CN105144751A|2013-04-15|2015-12-09|英迪股份有限公司|Audio signal processing method using generating virtual object|

CN108064014B|2013-04-26|2020-11-06|索尼公司|Sound processing device|

KR102160519B1|2013-04-26|2020-09-28|소니 주식회사|Audio processing device, method, and recording medium|

KR20140128564A|2013-04-27|2014-11-06|인텔렉추얼디스커버리 주식회사|Audio system and method for sound localization|

US10582330B2|2013-05-16|2020-03-03|Koninklijke Philips N.V.|Audio processing apparatus and method therefor|

US9491306B2|2013-05-24|2016-11-08|Broadcom Corporation|Signal processing control in an audio device|

KR101458943B1|2013-05-31|2014-11-07|한국산업은행|Apparatus for controlling speaker using location of object in virtual screen and method thereof|

WO2014204911A1|2013-06-18|2014-12-24|Dolby Laboratories Licensing Corporation|Bass management for audio rendering|

EP2818985B1|2013-06-28|2021-05-12|Nokia Technologies Oy|A hovering input field|

EP2830047A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for low delay object metadata coding|

EP2830050A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhanced spatial audio object coding|

EP2830045A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Concept for audio encoding and decoding for audio channels and audio objects|

KR20210141766A|2013-07-31|2021-11-23|돌비 레버러토리즈 라이쎈싱 코오포레이션|Processing spatially diffuse or large audio objects|

US9483228B2|2013-08-26|2016-11-01|Dolby Laboratories Licensing Corporation|Live engine|

US8751832B2|2013-09-27|2014-06-10|James A Cashin|Secure system and method for audio processing|

EP3056025B1|2013-10-07|2018-04-25|Dolby Laboratories Licensing Corporation|Spatial audio processing system and method|

KR102226420B1|2013-10-24|2021-03-11|삼성전자주식회사|Method of generating multi-channel audio signal and apparatus for performing the same|

EP3075173B1|2013-11-28|2019-12-11|Dolby Laboratories Licensing Corporation|Position-based gain adjustment of object-based audio and ring-based channel audio|

EP2892250A1|2014-01-07|2015-07-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a plurality of audio channels|

US9578436B2|2014-02-20|2017-02-21|Bose Corporation|Content-aware audio modes|

CN103885596B|2014-03-24|2017-05-24|联想有限公司|Information processing method and electronic device|

EP2928216A1|2014-03-26|2015-10-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for screen related audio object remapping|

KR101534295B1|2014-03-26|2015-07-06|하수호|Method and Apparatus for Providing Multiple Viewer Video and 3D Stereophonic Sound|

EP2925024A1|2014-03-26|2015-09-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for audio rendering employing a geometric distance definition|

WO2015152661A1|2014-04-02|2015-10-08|삼성전자 주식회사|Method and apparatus for rendering audio object|

KR20210114558A|2014-04-11|2021-09-23|삼성전자주식회사|Method and apparatus for rendering sound signal, and computer-readable recording medium|

US20170195819A1|2014-05-21|2017-07-06|Dolby International Ab|Configuring Playback of Audio Via a Home Audio Playback System|

PT3522554T|2014-05-28|2021-01-06|Fraunhofer Ges Forschung|Data processor and transport of user control data to audio decoders and renderers|

DE102014217626A1|2014-09-03|2016-03-03|Jörg Knieschewski|Speaker unit|

RU2698779C2|2014-09-04|2019-08-29|Сони Корпорейшн|Transmission device, transmission method, receiving device and reception method|

US9706330B2|2014-09-11|2017-07-11|Genelec Oy|Loudspeaker control|

CN113077800A|2014-09-12|2021-07-06|索尼公司|Transmission device, transmission method, reception device, and reception method|

WO2016040623A1|2014-09-12|2016-03-17|Dolby Laboratories Licensing Corporation|Rendering audio objects in a reproduction environment that includes surround and/or height speakers|

RU2701060C2|2014-09-30|2019-09-24|Сони Корпорейшн|Transmitting device, transmission method, receiving device and reception method|

CN106796797B|2014-10-16|2021-04-16|索尼公司|Transmission device, transmission method, reception device, and reception method|

GB2532034A|2014-11-05|2016-05-11|Lee Smiles Aaron|A 3D visual-audio data comprehension method|

EP3219115A1|2014-11-11|2017-09-20|Google, Inc.|3d immersive spatial audio systems and methods|

US10880597B2|2014-11-28|2020-12-29|Saturn Licensing Llc|Transmission device, transmission method, reception device, and reception method|

US10225676B2|2015-02-06|2019-03-05|Dolby Laboratories Licensing Corporation|Hybrid, priority-based rendering system and method for adaptive audio|

CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|

EP3258467B1|2015-02-10|2019-09-18|Sony Corporation|Transmission and reception of audio streams|

CN105989845B|2015-02-25|2020-12-08|杜比实验室特许公司|Video content assisted audio object extraction|

WO2016148553A2|2015-03-19|2016-09-22|소닉티어랩|Method and device for editing and providing three-dimensional sound|

US9609383B1|2015-03-23|2017-03-28|Amazon Technologies, Inc.|Directional audio for virtual environments|

CN106162500B|2015-04-08|2020-06-16|杜比实验室特许公司|Presentation of audio content|

EP3286929B1|2015-04-20|2019-07-31|Dolby Laboratories Licensing Corporation|Processing audio data to compensate for partial hearing loss or an adverse hearing environment|

US10304467B2|2015-04-24|2019-05-28|Sony Corporation|Transmission device, transmission method, reception device, and reception method|

US10187738B2|2015-04-29|2019-01-22|International Business Machines Corporation|System and method for cognitive filtering of audio in noisy environments|

US9681088B1|2015-05-05|2017-06-13|Sprint Communications Company L.P.|System and methods for movie digital container augmented with post-processing metadata|

US10628439B1|2015-05-05|2020-04-21|Sprint Communications Company L.P.|System and method for movie digital content version control access during file delivery and playback|

US10063985B2|2015-05-14|2018-08-28|Dolby Laboratories Licensing Corporation|Generation and playback of near-field audio content|

KR101682105B1|2015-05-28|2016-12-02|조애란|Method and Apparatus for Controlling 3D Stereophonic Sound|

CN106303897A|2015-06-01|2017-01-04|杜比实验室特许公司|Process object-based audio signal|

CN106664503B|2015-06-17|2018-10-12|索尼公司|Sending device, sending method, reception device and method of reseptance|

EP3680898A1|2015-06-24|2020-07-15|Sony Corporation|Audio processing apparatus and method, and program|

EP3314916B1|2015-06-25|2020-07-29|Dolby Laboratories Licensing Corporation|Audio panning transformation system and method|

US9854376B2|2015-07-06|2017-12-26|Bose Corporation|Simulating acoustic output at a location corresponding to source position data|

US9913065B2|2015-07-06|2018-03-06|Bose Corporation|Simulating acoustic output at a location corresponding to source position data|

BR112018000489A2|2015-07-16|2018-09-11|Sony Corporation|apparatus and method for information processing, and, program|

US9847081B2|2015-08-18|2017-12-19|Bose Corporation|Audio systems for providing isolated listening zones|

EP3145220A1|2015-09-21|2017-03-22|Dolby Laboratories Licensing Corporation|Rendering virtual audio sources using loudspeaker map deformation|

US20170098452A1|2015-10-02|2017-04-06|Dts, Inc.|Method and system for audio processing of dialog, music, effect and height objects|

US10251007B2|2015-11-20|2019-04-02|Dolby Laboratories Licensing Corporation|System and method for rendering an audio program|

WO2017085562A2|2015-11-20|2017-05-26|Dolby International Ab|Improved rendering of immersive audio content|

CA3003686A1|2015-12-08|2017-06-15|Sony Corporation|Transmitting apparatus, transmitting method, receiving apparatus, and receiving method|

WO2017098772A1|2015-12-11|2017-06-15|ソニー株式会社|Information processing device, information processing method, and program|

EP3393130B1|2015-12-18|2020-04-29|Sony Corporation|Transmission device, transmission method, receiving device and receiving method for associating subtitle data with corresponding audio data|

CN106937205B|2015-12-31|2019-07-02|上海励丰创意展示有限公司|Complicated sound effect method for controlling trajectory towards video display, stage|

CN106937204B|2015-12-31|2019-07-02|上海励丰创意展示有限公司|Panorama multichannel sound effect method for controlling trajectory|

WO2017126895A1|2016-01-19|2017-07-27|지오디오랩 인코포레이티드|Device and method for processing audio signal|

EP3203363A1|2016-02-04|2017-08-09|Thomson Licensing|Method for controlling a position of an object in 3d space, computer readable storage medium and apparatus configured to control a position of an object in 3d space|

CN105898668A|2016-03-18|2016-08-24|南京青衿信息科技有限公司|Coordinate definition method of sound field space|

WO2017173776A1|2016-04-05|2017-10-12|向裴|Method and system for audio editing in three-dimensional environment|

EP3465678B1|2016-06-01|2020-04-01|Dolby International AB|A method converting multichannel audio content into object-based audio content and a method for processing audio content having a spatial position|

HK1219390A2|2016-07-28|2017-03-31|Siremix Gmbh|Endpoint mixing product|

US10419866B2|2016-10-07|2019-09-17|Microsoft Technology Licensing, Llc|Shared three-dimensional audio bed|

US11259135B2|2016-11-25|2022-02-22|Sony Corporation|Reproduction apparatus, reproduction method, information processing apparatus, and information processing method|

WO2018147143A1|2017-02-09|2018-08-16|ソニー株式会社|Information processing device and information processing method|

EP3373604B1|2017-03-08|2021-09-01|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing a measure of spatiality associated with an audio stream|

JP6558513B2|2017-03-17|2019-08-14|ヤマハ株式会社|Content playback device, method, and content playback system|

EP3410747A1|2017-06-02|2018-12-05|Nokia Technologies Oy|Switching rendering mode based on location data|

US20180357038A1|2017-06-09|2018-12-13|Qualcomm Incorporated|Audio metadata modification at rendering device|

CN111108760B|2017-09-29|2021-11-26|苹果公司|File format for spatial audio|

EP3474576A1|2017-10-18|2019-04-24|Dolby Laboratories Licensing Corp.|Active acoustics control for near- and far-field audio objects|

US10531222B2|2017-10-18|2020-01-07|Dolby Laboratories Licensing Corporation|Active acoustics control for near- and far-field sounds|

WO2019132516A1|2017-12-28|2019-07-04|박승민|Method for producing stereophonic sound content and apparatus therefor|

WO2019149337A1|2018-01-30|2019-08-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatuses for converting an object position of an audio object, audio stream provider, audio content production system, audio playback apparatus, methods and computer programs|

GB2571949A|2018-03-13|2019-09-18|Nokia Technologies Oy|Temporal spatial audio parameter smoothing|

US10848894B2|2018-04-09|2020-11-24|Nokia Technologies Oy|Controlling audio in multi-viewpoint omnidirectional content|

WO2020071728A1|2018-10-02|2020-04-09|한국전자통신연구원|Method and device for controlling audio signal for applying audio zoom effect in virtual reality|

EP3726858A1|2019-04-16|2020-10-21|Fraunhofer Gesellschaft zur Förderung der Angewand|Lower layer reproduction|

KR102285472B1|2019-06-14|2021-08-03|엘지전자 주식회사|Method of equalizing sound, and robot and ai server implementing thereof|

CN114175685A|2019-07-09|2022-03-11|杜比实验室特许公司|Rendering independent mastering of audio content|

US20210152935A1|2019-11-15|2021-05-20|Boomcloud 360, Inc.|Dynamic rendering device metadata-informed audio enhancement system|

JP2021153258A|2020-03-24|2021-09-30|ヤマハ株式会社|Sound signal output method and sound signal output device|

US11102606B1|2020-04-16|2021-08-24|Sony Corporation|Video component in 3D audio|

US20220012007A1|2020-07-09|2022-01-13|Sony Interactive Entertainment LLC|Multitrack container for sound effect rendering|

法律状态:
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-11-05| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-04-06| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|

2021-07-06| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-09-08| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 27/06/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201161504005P| true| 2011-07-01|2011-07-01|

US61/504,005|2011-07-01|

US201261636102P| true| 2012-04-20|2012-04-20|

US61/636,102|2012-04-20|

PCT/US2012/044363|WO2013006330A2|2011-07-01|2012-06-27|System and tools for enhanced 3d audio authoring and rendering|

[返回顶部]